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Turning point 


The result of next week’s crucial UK referendum on whether or not to remain in the European Union 


will have worldwide repercussions. 


leave or remain in the European Union. At stake is not only the 
future of the United Kingdom and its place in the world, but 
also the future of Europe itself. 

For science and research, the benefits that flow from being part 
of the EU are obvious. Free movement of people makes it easier 
for researchers in one EU state to live and work in others, which 
in turn promotes access to a plethora of multi-country collabora- 
tions. Belonging to the EU gives member states ready access to 
a huge pool of diverse scientific expertise and shared research 
facilities (see page 307). 

The EU itself will spend more than €120 billion (US$135 billion) 
between 2014 and 2020 on research, collaboration and innovation, 
including around €40 billion in beefing up scientific infrastruc- 
ture in its poorer regions. Some €13 billion will go to one of the 
EU’s greatest research successes, the highly competitive European 
Research Council, created in 2007 to award research grants to 
scientists of any nationality. Not surprisingly perhaps, a Nature survey 
in March showed that an overwhelming majority of UK research- 
ers are in favour of remaining. Leading scientists from many disci- 
plines have taken to the pages of newspapers and to the airwaves to 
plead the case for staying in the EU, making science a theme of the 
political campaign. 


ik people of the United Kingdom will next week vote to either 


COOPERATION 

The benefits of EU regulations to research and innovation in the life 
sciences were highlighted in a report published on 11 June by the 
UK House of Commons Science and Technology Committee. But it 
also noted shortcomings, for example in the translating of EU legisla- 
tion into national laws. Some countries — Britain included — often 
implement national laws that go over and above that required by the 
EU (a practice known as gold-plating), resulting in variation between 
countries. The report also argued that the EU’s application of the 
‘precautionary principle’ in regulations needs to be more closely based 
on robust scientific evidence. 

Scientists in Britain and elsewhere will have their own complaints 
about the way the EU works. But the UK referendum should not be 
a vote on whether or not the EU is perfect — how could it be? The 
question must be whether the unique system of cooperation that it 
represents does what it sets out to do. 

It is Nature’s view that when it comes to science and science-based 
regulation, the EU is much greater than the sum of its parts. Over time, 
it has replaced a maze of regulations and technical standards in its 
28 member states — on everything from the life sciences to car parts 
— with common EU-wide regulations. Its environmental-protection 
laws are also widely recognized as world-leading. 

Such cooperation has helped Europe to become the research and 
economic powerhouse that it is today. And the strength of UK science 


has allowed Britain to have an outsized say in shaping EU research 
and regulations. Outside the EU, its influence would be greatly 
diminished. 

Many of those who have been pushing for Britain to leave 
complain of diminished sovereignty. But in the modern globalized 
world, a willingness to pool aspects of sovereignty is the only way for a 
country such as the United Kingdom to have any strong say in shaping 
international rules, from financial regulation to air pollution. Climate 
change, the environment, use of natural resources, energy security 
and sustainable agriculture: all are examples of science-based issues 
on which Europe can be much more effective as a bloc than any mem- 

ber state alone — not to mention countering 


“It is time to terrorism, or managing the potential threat 
build a better, of Russia on Europe’ eastern flank. At a time 
stronger EU,not — when so many of Europe's most important 


challenges are increasingly regional and 
global, it is time to build a better, stronger EU, 
not tear it down. The ‘Brexit’ camp insists that a split from the EU will 
allow Britain to make more of its own decisions. It might, but many of 
those decisions would carry much less weight. 

It is difficult to get multiple nation states to agree to sacrifice some 
autonomy for what is in their collective interest. It requires hard work 
and, of course, often plodding negotiation and compromise. Britain 
undervalues that effort at its peril. 

Built from the ruins of a Europe devastated by the Second World 
War, the EU has, despite its defects, woven together often-fractious, 
if not belligerent, nations into a bloc that has secured peace and 
democracy and has helped to build a Europe that has common 
values and rights. It has also managed to peacefully assimilate many 
former Soviet states under the democratic and societal obligations of 
the EU umbrella. 

Continued engagement of the United Kingdom in the EU is vital, 
and its citizens bear a heavy responsibility on 23 June. So do the rep- 
resentatives on both sides of the debate, who have tended to stray 
into hyperbole and exaggeration. For example, a central claim of the 
‘Leave’ campaign has been that a Brexit would free up £350 million 
(US$500 million) a week that could be spent on the National Health 
Service and other public services. This is simply false. That figure is 
Britain's gross contribution to the EU; when the money Britain receives 
back is taken into account, it is less than £250 million a week. The 
reality is that the United Kingdom is in full control of the vast major- 
ity of its public spending; its net contribution to the EU budget 
was around £8.8 billion, or slightly more than 1% of its total public 
spending of £735 billion, in 2014-15. As the Confederation of British 
Industry concludes: “The UK’s net budgetary contribution is a small 
net cost relative to the benefits.” 

We urge UK readers to critically examine the issues and to get out 
and vote — because every vote in this crucial election will matter. m 


tear it down.” 
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| THIS WEEK | EDITORIALS 
Under the sea 


Tf life in the oceans is to be preserved, people 
must get to know the wonders of the deep. 


highlighted once again just how poorly studied two-thirds of our 

planet's surface is. But this year’s tag line, “Healthy Oceans, Healthy 
Planet’, should remind us that we do know some things about the 
sea — notably, how much people depend on it. 

Millions of people rely directly on food taken from ocean waters, 
and millions more depend on money from fishing, tourism and other 
marine activities. But across the world, these relationships are often 
undermined. 

Nowhere is this more apparent right now than at the world’s coral 
reefs. Bathed in warming waters, reefs everywhere are bleaching as 
the corals on them sicken and turn white. Many will die, and so will 
animals that live on them. 

The outlook for corals is bleak, but it is not yet hopeless. Online 
this week, we publish one approach that could point to ways to 
rescue them from the brink (J. E. Cinner et al. Nature http://dx.doi. 
org/10.1038/nature 18607; 2016). A huge analysis of data on fish 
found at more than 2,500 reefs identifies 15 ‘bright spots’ — reefs 
in a better state than models suggest they should be — and then 
digs into the factors that might be responsible. Bright spots include 
unpopulated, unfished regions such as the Chagos islands, and 
areas that are close to towns and are fished, such as Kiribati and the 
Solomon Islands. The study also pinpoints 35 ‘dark spots’ where 
conditions were surprisingly poor, such as Montego Bay in Jamaica 
and Lord Howe Island in the Tasman Sea between Australia and 
New Zealand. 

The researchers used information on a reef’s habitat, depth, nearby 
human population and amount of fishing to model how many fish 
could live at each site. 

Such insights can help to steer conservation efforts. And conserva- 
tion of coral reefs is a popular cause. More difficult is the protection 
and preservation of what lies deeper. 


[: was World Oceans Day last week, and the annual event 


Although there is a huge public appetite for documentaries that 
detail the wonders found under the surface of our seas, to many people 
the oceans are a mysterious, even threatening, place. This feeling is 
reflected in — and doubtless enhanced by — the approach of story- 
tellers. From storms and sharks to mystery and other-worldliness, the 
oceans are made to seem an unknown and unknowable place: it is 
never safe to go back in the water. 

What we do know about life beneath the waves does sometimes 
make its way into the public consciousness. The 2003 animated film 
Finding Nemo, for example, delighted not just the public but also 
marine biologists, many of whom were impressed that the ocean they 
knew had been represented with such fidelity in how the animals 

moved and interacted (talking fish notwith- 


“Tomany people _ standing). 

the oceans are On page 325, we interview one of the 
amysterious, people responsible for that accuracy: Adam 
eventhreatening Summers of the University of Washing- 
place. ” ton in Friday Harbor. (He also worked on 


the sequel, Finding Dory, which lands this 
week.) Summers rightly points out that although filmmakers often 
need to bend or even break the truth to tell stories, facts can add 
something, too. 

As a biomechanist, his contribution was both to supply general fish 
facts, such as insights about the whale-shark character, and to give 
precise feedback on how the animals could move realistically even 
when they were doing things that no marine animal could actually 
do. If you watch and are amazed by the octopus sequences in the film, 
you will see the result of imbuing teams of highly talented animators 
with the knowledge of professional scientists. 

There are many marine researchers who reach out to the public 
and inspire a love of the sea by discussing their work. This should be 
applauded. But there are also many who only really talk to other ocean 
scientists about their work (a problem far from unique to the field). 

If more landlubbers are to engage with the oceans, and understand 
and appreciate them as researchers do, then all involved must do more 
to emphasize more widely the wonders of the depths and the threats 
that face them. 

Finding Nemo and Finding Dory may please scientists with their 
accuracy, but it would be a tragedy and a disaster if future generations 
had to watch them to find out what a coral reef looked like. m 


Nature distilled 


We need your views on an experiment to 
convey the latest research in digestible form. 


down to presenting science and its implications to the public, 

and presenting them to professional researchers. Public out- 
reach is important for science — it is the public that pays for most 
of it — and with much of our magazine content and the brief sum- 
maries of research papers made accessible to journalists in advance, 
much good science is available to them. But what of the professional 
researchers — how can Nature best present science to you? 

Any journal that tries to publish the most important results 
that it is sent, in all fields of science, will run into the same prob- 
lem. Every bit of our output, we hope, is useful and interesting to 
somebody somewhere. But even the most optimistic of our edi- 
tors would concede that the pool of readership for each of these 
specific advances is only a small subsection of our audience, 
professional researchers included. To the outside world, science 
is science. To those who read Nature, science is a multiplicity of 


S™ 1869, Nature has set itself two goals, which can be boiled 


296 | NATURE | VOL 534 | 16 JUNE 2016 


specialisms — and specialists. 

We know that most of you are specialists, and that you don't 
read most of what we present to you. You're busy people. It is hard 
enough to follow the literature that you need to read. Even the titles 
of research papers in an unfamiliar field can look incomprehensi- 
ble. But if you're anything like us, one reason you got into science 
in the first place was curiosity about the world — and not just the 
tiny piece of it that you now focus on. Wouldn't it be useful and 
interesting to keep better track of the rest? Or at least, the rest that is 
published in Nature, and therefore already judged to be important? 

We think so, and this week we begin an experiment to see how 
many of you agree. We have revisited 15 recently published Nature 
papers and asked the authors to produce two-page summaries of 
each. The summaries remain technical — these are not articles suit- 
able for the popular press — but they try to communicate both the 
research advance and why it matters. The authors of these papers 
have been enthusiastic — they want the broadest possible reader- 
ship — and we thank them for their cooperation. Now we want 
to know what you think. The first three summaries are published 
online this week (see go.nature.com/luhcy3x). The rest will be 
released in the coming weeks. Please take a look. Be brave — picka 
topic that you expect to struggle with — and then fill in the online 
survey to let us know what you think. = 
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Fort McMurray region of Canada last month, but the inferno 

is likely to burn until the snow falls later in the year. The raging 
fire was a true force of nature. It sent out embers that skipped natural 
barriers such as the Athabasca River, hundreds of metres wide, and 
created its own weather system: pyrocumulus clouds that generated 
lightning, which ignited another fire some 30 kilometres away. The 
fire was unstoppable. 

Unfortunately, the largest residential community in the Cana- 
dian boreal forest was in its way, and the resulting scenes of devas- 
tation drew attention from around the world. The fire is projected 
to be the costliest natural disaster in the nation’s modern history. 
A repeat of last month’s scenes is unacceptable, so what can be done? 
One thing is certain: there is now a tacit under- 
standing among all Canadians that something 
must change. 

Science has the tools to drive that change. 
Canada and other fire-prone regions should 
develop new maps of fire risk and use them to 
guide development and mitigation efforts. Simi- 
lar maps are already produced around the world 
for areas prone to flooding and earthquakes, and 
it is unfortunate that so little has been done so 
far to measure wildfire risk. This is especially 
true across Canada, where hundreds of com- 
munities embed themselves in one of the most 
flammable places on Earth. 

Wildfire spreads through a boreal forest with 
a speed and intensity not seen in other land- 
scapes. But a fire’s impact depends on factors 
that we can measure. With data on fire igni- 
tions, weather, vegetation and topography, we can build models to 
demonstrate how we expect a region to burn should it catch fire. 
These can show two things that are important to guide policy: the 
probability of burning, and the likely fire intensity. The first shows 
the chances ofa fire taking hold, and the second indicates how severe 
the consequences will be. 

These maps show which areas, if they ignite, will burn at such a 
high temperature that attempts to fight the fire will never succeed. 
The only option is to evacuate, or not to live there in the first place. 
The maps can also identify parts of the forest where, because of the 
nature of the landscape and flora, fire would be easier to prevent and 
tackle. This knowledge can be used to allocate money and effort to 
places where mitigation is more likely to work. 

A range of possible policies can minimize fire risk. Wood shingles 
used to build houses can be replaced by metal and asphalt; vents and 
gaps that can admit embers can be blocked; and flammable land- 
scape can be reduced. Some houses burned down in Fort McMurray 
because homeowners had planted conifer trees and shrubs — a poor 
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Science can map a solution 
to a fast-burning problem 


Wildfires such as those that hit Canada last month are a growing worry, writes 
Marc- André Parisien, but risk-assessment models can limit future damage. 


choice, given that conifers burn much more readily than do decidu- 
ous trees in this part of the world. The surrounding forests can be 
managed too. Prescribed burning of vegetation and other measures 
can set up effective firebreaks. 

This is not always easy or popular. I lived in one of these commu- 
nities as a child. Many people make a conscious decision to live in 
the forest and don’t want to see it altered. To fell trees and mandate 
the type of shrubbery they are allowed to plant can feel like a viola- 
tion. But using fire-risk maps can sometimes show that the most 
effective intervention might be a few kilometres upstream of the 
prevailing wind. Fire management does not always require changes 
on people's doorsteps. Indeed, the recent blaze started as four sepa- 
rate fires — and it was the one that started the farthest from the town 
that caused the greatest damage. 

After the Fort McMurray fire, our team at 
the Canadian Forest Service went back and 
produced fire-risk maps for the region. They, 
correctly, showed that a fire in the area was 
likely to take hold, and that when it did it would 
be unstoppable. 

We need to set up a systematic framework 
to construct more of these maps. In Canada, 
this would demand a partnership between 
the federal government and provinces and 
municipalities that directly manage boreal 
wildlands. In other countries, similar part- 
nerships could be formed to great benefit. 
We would also need to consult and engage 
with the people who live and work in the for- 
est. It would not be cheap, but a Canada-wide 
framework to assess wildfire risk would still 
be a fraction of the cost of a major fire incident. The good news is that 
we already have much of the data, as well as the tools and the expertise. 

Continued human expansion into the Canadian boreal forest for 
natural-resource extraction and housing is inevitable. It is too late 
for Fort McMurray, but risk-assessment maps can guide this new 
development and direct it to low-risk areas. Some of these places are 
obvious: new settlements could take advantage of natural firebreaks 
such as large lakes to help shield them. Other preferable areas could 
be more surprising and would be identified only with the help of these 
maps; for example, landscape and vegetation in such areas might 
combine in unusual ways to reduce risk. 

Ina warmer and drier future climate, more fires are expected. We 
need to use our knowledge of boreal fire dynamics to find better ways 
to live and work safely in this hazardous environment. m 


Marc-André Parisien is a fire researcher with the Canadian Forest 
Service in Edmonton, Canada. 
e-mail: marc-andre.parisien@canada.ca 
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RESEARCH HIGHLIGHTS 


CRISPR blocks 
cancer growth 


Knocking out genes in cancer 
genomes with the CRISPR- 
Cas9 technique decreases 

the ability of cancer cells to 
multiply. 

William Hahn at the Dana 
Farber Cancer Institute in 
Boston, Aviad Tsherniak 
at the Broad Institute 
of Harvard and MIT in 
Cambridge — both in 
Massachusetts — and their 
colleagues silenced certain 
genes in 33 cancer-cell lines 
using CRISPR-Cas9, which 
can be programmed to snip 
DNA at specific locations. 
They found that in parts of 
the genome with multiple 
copies of a gene, the number 
of DNA breaks made by the 
CRISPR system was linked 
to a drop in cell proliferation, 
an outcome not seen with 
another gene-silencing tool 
called RNA interference. This 
effect could be the result of 
how CRISPR-made DNA cuts 
are repaired. 

The results suggest that 
cancer cells are sensitive to 
site-specific DNA damage, 
and have implications 
for how experiments 
using CRISPR should be 
interpreted. Targeting 
genomic regions that have 
many repeated sequences 
could be a new therapeutic 
strategy, the authors suggest. 
Cancer Discov. http://doi.org/bjzn 
(2016) 


Excess nitrogen 
spoils biofuels 


Nitrogen fertilizer can boost 
the growth of crops for biofuel 
production, but applying 
too much can cut the climate 
benefits in half. 

Ethanol fuel made from 


Selections from the 
scientific literature 


How squid hide their eyes 


A transparent squid may camouflage itself by 
activating specialized cells in its eyes. 

Many marine creatures emit light to hide 
shadows that might be seen by predators 
below. To find out how animals control this 
bioluminescence, Amanda Holt and Alison 
Sweeney at the University of Pennsylvania 
in Philadelphia used transmission electron 
microscopy to study the eyes of the squid 
Galiteuthis (pictured). They found that the 
underside of the eye — one of the few parts 


plant cellulose is a promising 
form of renewable energy. 
Philip Robertson at Michigan 
State University in Hickory 
Corners and his colleagues 
applied various amounts 

of nitrogen fertilizer to 
experimental plots of 
switchgrass (Panicum 
virgatum) for three years. 
They measured emissions of 
the greenhouse gas nitrous 
oxide (N,O) and the leaching 
of nitrate, a water pollutant. 
The authors found that 
fertilizer boosted yields 

in the first year, but that 

the increase declined with 
subsequent applications. 
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Levels of both emissions and 

leaching grew exponentially 

with increases in fertilizer. 
The team suggests that 

minimizing fertilizer use will 

be crucial for maintaining 

the environmental benefits of 

cellulosic biofuel. 

Environ. Res. Lett. 11,064007 

(2016) 


Tiny carbon rods 
blow off steam 


Nanometre-sized rods of 
carbon can expel water in 
puffs of vapour when the 
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of the creature that is not transparent — 

has fibre-like cells in a range of shapes that 
channel bioluminescence while leaking light at 
different rates. 

The authors modelled how the light travels 
through the various cell shapes. They suggest that 
the squid could activate different populations of 
cells to vary the intensity and distribution of the 
light passing through them, allowing the animal 
to camouflage itself at any depth. 

J. R. Soc. Interface 13, 20160230 (2016) 


air is already humid. 
Materials such as carbon 
and silica gels typically pick 
up moisture as humidity 
increases. But Satish Nune 
and his colleagues at the 
Pacific Northwest National 
Laboratory in Richland, 
Washington, found that their 
carbon-based nanorods take 
up water at low humidity and 
then give off about half of it 
when the relative humidity 
exceeds 50-80%. The team 
thinks that water condenses 
between adjacent rods and 
then capillary forces draw the 
rods together until the water 
bursts from the ends of the 
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rods and evaporates. 

Nature Nanotechnol. http://dx.doi. 
org/10.1038/nnano.2016.91 
(2016) 


Fish keep coming 
out of water 


Fish have evolved to live on 
land multiple times, suggesting 
that the crucial transition 
from water to land during the 
evolution of terrestrial life may 
not have been unusual. 

Terry Ord and Georgina 
Cooke at the University 
of New South Wales in 
Kensington, Australia, looked 
at data on the behaviour and 
ecology of living fish and 
identified 33 different families 
that include amphibious 
species, some of which seldom 
leave the land. In one family, 
the blenny fish (Blenniidae), 
amphibious lifestyles evolved 
3-7 times. The duo observed 
one primarily aquatic species 
of blenny (Praealticus 
labrovittas) emerging onto 
land on warm days on the 
western Pacific island of 
Guam. 

The ability to survive on 
land could help fish to cope 
with the low oxygen levels of 
warm seawater, and prevent 
them getting stuck in tidal 
pools, the authors propose. 
Evolution http://doi.org/bjzq 
(2016) 


MICROBIOLOGY 


A wealth of anti- 
CRISPR proteins 


Proteins that inhibit the 
activity of the CRISPR-Cas 
bacterial defence system 
could be widespread. 
Viruses and other 
microbes often successfully 
transfer genes to bacteria, 
despite the presence of 
the bacterial CRISPR-Cas 
system, which recognizes 
and attacks foreign DNA 
or RNA. Karen Maxwell 
and Alan Davidson at 
the University of Toronto 
in Canada and their 
colleagues had previously 
described nine families of 


anti-CRISPR protein that 
help certain viruses to infect 
Pseudomonas bacteria. 
Now, using bioinformatics, 
the team has identified 
five more anti-CRISPR 
protein families in a range of 
microorganisms that inhibit 
CRISPR-Cas systems in 
Pseudomonas aeruginosa and 
Pectobacterium atrosepticum. 
Anti-CRISPR proteins 
could have an important role 
in gene transfer between 
bacteria, including the 
spread of genes involved 
in antibiotic resistance, the 
authors say. 
Nature Microbiol. http://dx.doi. 
org/10.1038/nmicrobiol.2016.85 
(2016) 


Liquid-like solid 
lets cells grow 


A scaffold made of tightly 
packed hydrogel particles 
allows cultured cells to grow 
in custom 3D configurations. 

Developed by Thomas 
Angelini and his colleagues 
at the University of Florida 
in Gainesville, the scaffold 
is made ofa liquid-like solid 
material that temporarily 
becomes fluid when force is 
applied, and rapidly solidifies 
after the force is removed. 
Angelini’s team 3D-printed 
clusters of various types of 
cell inside the liquid-like 
solid, creating multicellular 
structures in the shape ofa 
sphere, a loop and a simple 
flower (pictured). 

In contrast to other, stiffer 
scaffolds used for 3D cell 


200 um 
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culture, this one is not easily 
damaged when cells are 
injected into it, and does 
not need to be broken down 
by enzymes to allow cells to 
grow and migrate. 

ACS Biomater. Sci. Eng. 
http://doi.org/bjzp (2016) 


| NEUROSCIENCE 
Myelin clogs up 
immune cells 


The insulating layer around 
nerve fibres breaks down as 
mice age, and this could lead 
to immune dysfunction. 

The myelin layer coats 
nerves to speed up signal 
transmission. Mikael Simons 
at the Max Planck Institute 
for Experimental Medicine 
in Gottingen, Germany, and 
his colleagues used electron 
microscopy to study the brains 
of mice. They found that the 
amount of myelin fragments 
increased with age and that 
the pieces were taken up by 
immune cells in the brain 
called microglia, which engulf 
debris and foreign materials. 
During this process, insoluble 
fatty aggregates accumulated 
in the microglia and the ability 
of the cells to take up material 
declined. 

The authors suggest 
that microglia become 
overwhelmed with the 
growing amount of myelin 
debris, making them less 
able to function in the ageing 
brain. 

Nature Neurosci. http://dx.doi. 
org/10.1038/nn.4325 (2016) 


DEVELOPMENTAL BIOLOGY 


Dragon lizard gets 
sex change 


A shift in egg-incubation 
temperature can result ina 
genetically male lizard having a 
mix of male and female traits. 

The sex of some reptile 
species is determined by 
genetics, but in others it 
depends on egg-incubation 
temperature. Richard Shine 
at the University of Sydney in 
Australia and his colleagues 
studied hatchlings and 
juveniles of the central bearded 
dragon (Pogona vitticeps; male 
pictured). In this species, sex is 
normally controlled genetically, 
but incubation temperatures of 
32°C and above can produce 
sex-reversed females from male 
embryos. The team incubated 
eggs at constant temperatures 
between 26°C and 34°C, and 
found that although sex- 
reversed females are capable 
of laying eggs — and even 
produce more eggs than genetic 
females — they are similar to 
males in their morphology and 
behaviour. 

This mix of traits could 
enhance fitness under certain 
conditions, which could 
cause a rapid elimination of 
sex-determination genes, the 
authors say. 

Proc. R. Soc. B 283, 20160217 
(2016) 
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SEVEN DAYS nesennss 


EVENTS 


Nobel support 

As the British people prepare 
for a 23 June vote on whether 
to leave or remain in the 
European Union, 13 Nobel 
laureates have lent their 
support to staying in. Ina letter 
published on 10 June in the 
newspaper The Daily Telegraph, 
the laureates, including 
physicist Peter Higgs and 
geneticist Paul Nurse, warned 
that UK science would suffer in 
the event of a ‘Brexit. “Science 
thrives on permeability ofideas 
and people, and flourishes 

in environments that pool 
intelligence, minimise barriers, 
and are open to free exchange 
and collaboration,’ they argue, 
stressing that the EU provides 
just that environment. See 

page 307 for more. 


Cyberattack 


The University of Calgary in 
Canada said on 7 June that 

it had paid a ransom worth 
Can$20,000 (US$15,600) in 
untraceable Bitcoins to hackers 
who had encrypted much of 
the university's past 50 years of 
research data. A spokeswoman 


STANDING UP FOR SCIENCE 
Nominations are invited 
for the John Maddox 
Prize, which rewards an 
individual in any country 
who has promoted sound 
science and evidence ona 
matter of public interest. 
The £2,000 (US$3,000) 
prize puts emphasis on 
those who have faced 
difficulty or hostility for 
their efforts. The prize is 
awarded by Nature and 
the London-based Sense 
About Science, and is 
supported by the Kohn 
Foundation. The deadline 
for nominations is 1 August 
2016. See go.nature. 
com/9rvd1t. 


Brighter nights for Earth 


The glow of artificial light denies one-third of humanity the 
sight of the Milky Way. The ‘new world atlas of artificial night 
sky brightness, produced with recently available imaging 
data from NASA's high-resolution Suomi National Polar- 
orbiting Partnership satellite, maps global night pollution in 
improved detail. Released on 10 June, it shows that nocturnal 
light pollution (pictured) is worst in Italy and South Korea, 
with Canada and Australia the least affected of industrialized 
countries (E. Falchi et al. Sci. Adv. 2, 1600377; 2016). The 
authors calculate that the ongoing conversion from high- 
pressure sodium lamps to white LED lighting could lead toa 
doubling of the undesired brightness of the night sky. 


for the university said that 

no personal or university 

data had been released to the 
public. Most of the researchers 
affected regained access to 
their data within three days, 
she said. The university was 
confident that it could restore 
most missing data from 
back-ups, and that it had 
purchased the decryption keys 
from the hackers only as a last 
resort. 


Elemental choice 
Three newcomers to the 
periodic table are to be called 
nihonium, moscovium 

and tennessine in reference 
to Japan, Moscow and 
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Tennessee, where scientists 
first created them. A fourth 
entry, oganesson, is to be 
named in honour of 83-year- 
old Russian scientist Yuri 
Oganessian, who helped to 
discover it — only the second 
time that a chemical element 
has been named after a living 
scientist. The International 
Union of Pure and Applied 
Chemistry proposed the 
names and symbols of the 
four elements with atomic 
numbers 113, 115, 117 and 
118 — short-lived artificial 
elements that do not occur 
in nature — on 8 June. See 
go.nature.com/1vowh1u for 
more. 
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SSS ei 
Scottish adviser 


Sheila Rowan became chief 
scientific adviser to the 
Scottish government on 

13 June. The University of 
Glasgow physicist is part of 
the US-based Advanced Laser 
Interferometer Gravitational- 
Wave Observatory (LIGO) 
collaboration, which in 

2015 made the first direct 
observation of gravitational 
waves. In an appointment 
lasting three years, Rowan 

is tasked with providing the 
Scottish government with 
expert advice on science- 
related issues, and with 
championing the use of science 
in policy development. 


Jerome Bruner dies 
Psychologist Jerome Seymour 
Bruner died at the age of 100 
on 5 June. Working at Harvard 
University in Cambridge, 
Massachusetts, then at the 
University of Oxford, UK, and 
New York University, Bruner 
was a pioneer of cognitive 
psychology before turning his 
attention to developmental and 
educational psychology. He 
coined the term ‘scaffolding’ to 
describe how children build on 
the knowledge they have, and 
proposed a teaching approach 
based on it. He was on the 
science advisory committees 
during the presidencies of 
John FE. Kennedy and Lyndon 
B. Johnson. 


| _ERESEARCH 
LIGO strikes again 


Nine months after the first 
discovery of gravitational 
waves made science history, 
news broke on 14 June that 

a similar event has been 
observed. On 26 December, 
the twin detectors of 

the Advanced Laser 
Interferometer Gravitational- 
Wave Observatory (LIGO) in 
Louisiana and Washington 
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state detected the tell-tale sign 
of two black holes spiralling 
into each other and merging 
about 429 million parsecs 
away. Researchers reported 
the finding this week, at a 
meeting of the American 
Astronomical Society in San 
Diego, California. 


pe FUNDING 
NIH windfall 


The US National Institutes 

of Health (NIH) is poised to 
receive its second major budget 
increase in two years. On 

7 June, the US Senate voted to 
add US$2 billion to the agency’s 
annual budget, bringing it 

to $34 billion. The boost will 
have to be reconciled with the 
US House of Representatives’ 
spending bill, which has yet 

to be released. The windfall 

is to be focused on specific 
programmes, including a 
precision-medicine initiative 
and research on Alzheimer’s 
disease, but does not include 
any money for the Cancer 
Moonshot initiative promoted 
by US vice-president Joe Biden. 


ENVIRONMENT 
Deer prion in moose 


The discovery in Norway 
of two moose infected 

with chronic wasting 
disease suggests that the 
neurodegenerative disorder 
might be gaining traction in 
Europe. The fatal, infectious 


TREND WATCH 


Scientists spend an average of 
8.4 hours per article reviewing 


prion disease is related 

to bovine spongiform 
encephalopathy (‘mad cow 
disease’), and was thought to be 
limited to deer, elk and moose 
in North America and South 
Korea. In April, researchers 

at the Norwegian Veterinary 
Institute in Oslo detected it 

for the first time in Europe 

in a wild reindeer (Rangifer 
tarandus tarandus; see Nature 
http://doi.org/bjz5; 2016). Since 
May, at least three Norwegian 
moose (Alces alces, pictured) 
have tested positive for the 
disease, which is caused by 
abnormal proteins. Researchers 
don’t yet know how the disease 
spread to Europe, or in which 
species it originated in Norway. 


Chemical bill 


The US Senate on 7 June 
approved a bill to overhaul 
and strengthen the 1976 Toxic 


PEER REVIEW BY THE HOUR 


Substances Control Act, which 
has been widely criticized 

as ineffective. Companies 

have registered some 

85,000 chemicals with little or 
no safety review by government 
regulators. Roughly 

700 new chemicals enter the 
marketplace each year. The 
updated legislation would 
grant the US Environmental 
Protection Agency authority 
to investigate hazardous 
chemicals, both new and old, 
that are used in consumer 
products or commercial 

and industrial processes. US 
President Barack Obama, who 
strongly supports the reform, is 
expected to sign the legislation 
into law. 


Climate pledges 

US President Barack Obama 
and Indian Prime Minister 
Narendra Modi agreed toa 
suite of climate goals during 
a meeting at the White House 


Reviewing maths papers is most time-consuming. 


Arts and 
humanities 


papers submitted for publication. 
But reviewing times differ with 
subject area and reviewer age, 
according to a survey of opinions 
and attitudes towards peer review 
by the London-based Publishing 
Research Consortium. On 
average, reviewers spend twice as 
long on papers in mathematics 
and computer sciences as on 
papers in clinical and social 
sciences. Academics who are 
older than 65 take half the time of 
those under 36. 


Engineering 


Maths and 
computer science 


Sciences 


Social 
sciences 
0 2 4 6 8 10 12 «14 

Time spent per paper (hours) 


SEVEN DAYS | THIS WEEK | 


18 JUNE 

Crew members 

Tim Peake, Yuri 
Malenchenko and Tim 
Kopra return to Earth 
from the International 
Space Station. 
go.nature.com/lungtl 


20-24 JUNE 

The 11th International 
Conference on 
Permafrost takes place 
in Potsdam, Germany. 
icop2016.0rg 


on 7 June. The two leaders 
pledged to push for a 2016 
amendment to the Montreal 
Protocol on Substances that 
Deplete the Ozone Layer 

to reduce the use of heat- 
trapping hydrofluorocarbons, 
which are commonly used in 
air-conditioning units. They 
also announced a string of 
clean-energy partnerships, 
and Obama promised to 
ratify the 2015 Paris climate 
agreement as early as this 
year. 


Visa overhaul 


The European Commission 
has proposed changes to 
the visa scheme for highly 
educated prospective 
migrants, to facilitate the 
legal immigration of skilled 
workers. The existing Blue 
Card system, adopted in 
2009, has failed to attract 
much foreign talent to the 
European Union because 
of restrictive admission 
conditions and conflicting 
national visa rules, EU 
migration commissioner 
Dimitris Avramopoulos 
said on 7 June. A relaxed 
EU-wide scheme would 
enable immediate work 
permits and quicker access 
to long-term residence. The 
United Kingdom, Ireland and 
Denmark are exempted from 
the proposed visa scheme. 


> NATURE.COM 
For daily news updates see: 
www.nature.com/news 


16 JUNE 2016 | VOL 534 | NATURE | 301 


© 2016 Macmillan Publishers Limited. All rights reserved. 


MAURO FERMARIELLO/SPL 


NEWSIN FOCUS 


SPACE Thousands of THERAPIES Biotechnology SPECIAL REPORT Five core ways 
volunteers will comb the successes create cost that the EU has changed 
ground for meteorites p.304 conundrum p.305 science p.307 


MEDICINE When old drugs 
\ find new uses, prices could 
\ be slashed p.314 


— 


i 


Many pets are treated like family members — and that is often reflected in the veterinary care that they receive. 


Demand for pet medicines 
sparks a biotech boom 


Longer-lived animals inspire a new breed of care — from antibodies to cell therapies. 


BY HEIDI LEDFORD 


ittle Jonah once radiated pain. The 
Ler Maltese dog’s body was 

curled and stiff from the effort of walk- 
ing with damaged knees. But after Kristi 
Lively, Jonah’s veterinary surgeon, enrolled 
him ina clinical trial of a therapeutic antibody 
to treat pain, his owner returned to the Vil- 
lage Veterinary Medical Center in Farragut, 
Tennessee, with tears in her eyes. Her tiny 
companion trotted easily alongside her. “I got 
my dog back,’ she said. 


Such cutting-edge treatments were once 
reserved for humans. But in recent years, the 
changing nature of pet ownership has sparked 
a boom in sophisticated therapies for animals 
— and many are now approaching the mar- 
ket. On 9 June, the company that sponsored 
the antibody trial, Nexvet of Dublin, pre- 
sented its results at the American College of 
Veterinary Internal Medicine Forum in Den- 
ver, Colorado. Other companies are working 
on bone-marrow transplants, sophisticated cell 
therapies and cancer vaccines. 

“When I was a child and just wanted to be 
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a veterinarian, certainly I didn’t imagine Id 
be doing what I’m doing now,’ says Heather 
Wilson-Robles, a veterinary oncologist at 
Texas A&M University in College Station, 
who is engineering canine immune cells to 
fight cancer. 

Cancer, arthritis and other diseases 
associated with old age are becoming more 
common as pets live longer, thanks in part to 
better treatment by their owners. “A genera- 
tion ago, as beloved as Snoopy was, he lived 
in the backyard in the doghouse,’ says Steven 
St. Peter, president of Aratana Therapeutics, > 
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> apet-therapy company in Leawood, Kansas. 
Now, pets are considered family members, 
often sharing beds with owners who are willing 
to pay hefty veterinary bills. 

Many standard pet treatments are human 
drugs given at lower doses to account for 
animals’ smaller size. But antibodies and cell 
therapies generally cannot be used across spe- 
cies without provoking an unwanted immune 
response. And some human treatments sim- 
ply will not work in pets: many common pain 
medications are toxic to cats. 

Nexvet, which has raised more than 
US$80 million from investors since it was 
founded in 2011, takes antibodies that have 
been approved as human medicines and 
alters their structures to make them effective 
in cats or dogs. Moving from a drug lead to 
safety testing takes about 18 months, says chief 
executive Mark Heffernan, who estimates that 
Nexvet’s antibody therapies for pain will cost 
around $1,500 a year. The company is now 
looking into developing antibodies that block 
a protein called PD-1, thereby unleashing the 
immune system to fight cancer. This approach 
has shown tremendous promise for treating 
cancer in people. 

Aratana is also developing antibody thera- 
pies for pets, and has applied for regulatory 
approval of a cancer vaccine that uses a bacte- 
rium to target malignant cells. The company 


hopes to move into cell therapies, and to 
develop a way to manufacture stem cells from 
fat for use against joint pain. St. Peter wants his 
company to be the first to win approval from 
the US Food and Drug Administration for a 
stem-cell therapy — ahead of firms developing 
such treatments for people. 

Other forms of cell therapy could also 
result in new veterinary remedies. Last 
July, veterinary 


oncologist Colleen “A generation 
O’Connor founded ago, as beloved 
acancer-treatment aS Snoopy was, 
company in Hou-_ helivedinthe 
ston, Texas, called backyard inthe 


CAVU Biotherapies. 
To treat lymphoma, 
CAVU aims to isolate a sick dog’s immune 
cells, rejuvenate them in culture, and then 
infuse them back into the dog’s blood to stim- 
ulate an immune response. O'Connor used a 
similar approach in 2011 to treat Dakota, a 
bichon frise that belonged to then-US Sena- 
tor Kent Conrad (Democrat, North Dakota). 
The dog, a Capitol Hill fixture known as the 
“101st senator, entered remission but later 
died of cancer. 

For many pet owners, cost is no object. 
Steven Suter, a veterinary oncologist at North 
Carolina State University in Raleigh, runs a 
bone-marrow transplant clinic for dogs that 


doghouse.” 


claims to cure 33% of lymphomas. Suter’s 
clinic was booked solid after it opened in 2008, 
despite offering treatment that can cost a dog 
owner up to $24,000. Still, Suter has worked 
to drive down the cost of care: to filter stem 
cells from blood, his clinic uses second-hand 
machines that were donated by a physician 
with a soft spot for schnauzers. Earlier this 
year, several major pet-insurance companies 
added bone-marrow transplants to the lists of 
procedures that they will pay for. 

But when it comes to the latest pet treat- 
ments, some animals might be more equal 
than others. Cats are “physiologically finicky’, 
Suter says, noting that they may be too small 
to allow bone-marrow transplants using his 
usual machines. And O’Connor notes that 
cats’ immune systems also differ wildly from 
those of both humans and dogs — mean- 
ing that more basic research must be done 
before sophisticated immunotherapies can be 
deployed against feline ailments. 

At Lively’s clinic, many dog and cat owners 
were grateful that their animals could par- 
ticipate in Nexvet'’s clinical trial. But about a 
month after the trial ended, the effects of the 
antibody therapy began to fade. Jonah’s owner 
was among the clients who called Lively, des- 
perate for a way to access the treatment again. 
“It’s tough,” Lively says. “They'll have to wait 
until this product comes to market? = 


ASTRONOMY 


France launches massive 
meteor-spotting network 


Tracking space rocks that reach Earth will give insight into the early Solar System. 


BY TRACI WATSON 


unprecedented campaign to catch 
shooting stars, an effort that will rely on 
thousands of volunteers to comb the ground 
for bits of space rock. 
The Fireball Recovery and InterPlanetary 


2 


MORE 
ONLINE 


yenpec in France have launched an 
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Observation Network (FRIPON), inaugu- 
rated on 28 May, already includes 68 cameras 
that scan the skies for meteors, which are 
seen when bits of asteroid, comet or other 
planetary material streak through Earth’s 
atmosphere. By the end of this year, some 
100 cameras will blanket France, organizers 
say. That would make it one of the biggest 


and densest meteor-spotting networks in the 
world. 

“If tomorrow a meteorite falls in France, 
we will be able to know where it comes from 
and roughly where it has landed,” says Jérémie 
Vaubaillon, an astronomer at the Paris Obser- 
vatory and one of organizers of the system. 

Meteorites — chunks of stone that have 


| MORE NEWS | 
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found with to fix reproducibility go.nature.com/10ncgxz modifying 
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com/21jq422 ministry merger go.nature.com/Ipqsthy podcast 
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FRIPON 


fallen from space and reached Earth’s sur- 
face — provide valuable insights into every- 
thing from the history of the Solar System to 
the identity of asteroids that could potentially 
collide with Earth. Snagging such objects is 
“the one chance you get to see Solar System 
material in your hands’, says David Clark, who 
studies meteors at the University of Western 
Ontario in London, Canada. “We simply don't 
have enough of this stuff” 


FIRE IN THE SKY 

Especially prized are meteorites that were 
tracked on their inward journey. Scientists 
can use data about the journey to reconstruct 
the object’s trajectory and reveal where in the 
Solar System it came from. People manage 
to retrieve just one to three meteorites with 
known trajectories each year, says Peter 
Jenniskens, an astronomer at the SETI Institute 
in Mountain View, California. 

FRIPON’s organizers dream of collect- 
ing one tracked meteorite per year from the 
French landscape. By comparison, researchers 
with the large and dense Spanish Meteor Net- 
work have scored 2 in the past 12 years. 

The French network’s cameras are very 
densely and evenly spaced, sitting roughly 
70-80 kilometres apart at laboratories, 
science museums and other buildings — 
close enough together to yield good infor- 
mation about where meteorites land. “That 
increases your chance of finding something,” 
says Jenniskens. 

FRIPON is also the first fully connected and 
automated network, says principal investiga- 
tor Francois Colas, of the Paris Observatory. 


Fisheye cameras will cover France as part of the meteor-spotting network. 


When a camera detects a meteor, it sends a 
message to a central computer in Paris. Iftwo 
or more cameras spot the fireball, FRIPON 
scientists receive an e-mail describing where 
it was seen. Eventually, the e-mail will include 
automatically generated information about 
the object’s probable landing zone, pin- 
pointing it to an area roughly 1 kilometre by 
10 kilometres. 

The researchers will then face the arduous 
job of searching this area to find the object. 
At first, scientists will conduct the ground 
searches. But in the next few years, FRIPON 
organizers plan to train an army of citizen 


scientists to walk the French landscape look- 
ing for bits of meteorite — and to hand over 
any finds. 

Perhaps one in 1,000 volunteers will actually 
turn up for a search, estimates Brigitte Zanda, 
a meteorite specialist at the National Museum 
of Natural History in Paris, who heads the vol- 
unteer effort. Organizers hope to field a search 
team of 30 people in every part of France, so 
they will have to recruit hundreds of thousands 
of people, she says. “It’s ambitious.” But hun- 
dreds of people have already signed up, even 
though the official recruitment drive is just 
getting under way. = 


DRUG PRICING 


Gene therapies pose 
million-dollar conundrum 


Economists, investors and medical insurers can’t work out how to pay for cutting-edge drugs. 


BY ERIKA CHECK HAYDEN 


rugs that act by modifying a patient's 
Dse= are close to approval in the 

United States, and one is already 
available in Europe. The developments mark 
a triumph for the field of gene therapy, once 
considered controversial. 

But with estimated price tags of at least 
US$1 million per patient, how will anyone 
pay for these treatments? The question is just 
one in a broader debate about how to finance 
a range of super-expensive drugs that are now 


available, thanks to an explosion in genetic 
and molecular-biology research over the 
past 20 years. 

“Advances in science are presenting a social 
affordability question like never before,’ says 
economist Mark Trusheim at the Massachu- 
setts Institute of Technology in Cambridge. 
“Do we want to convert the science into thera- 
pies that we actually would have to pay for?” 

Trusheim spoke at the Biotechnology 
Innovation Organization (BIO) meeting in 
San Francisco, California, on 6—9 June, which 
featured much discussion about how society 
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will pay for the rising costs of new drugs. At the 
American Society of Clinical Oncology meet- 
ing in Chicago, Illinois, on 3-7 June, dozens of 
talks and abstracts focused specifically on the 
growing cost of cancer care. Cancer drugs that 
unleash the power of the immune system cost 
up to $40,000 per year. 

Gene therapies that are close to US approval 
include treatments for haemophilia B, sickle- 
cell anaemia and the neurodegenerative dis- 
ease cerebral adrenoleukodystrophy. A therapy 
under development at Spark Therapeutics in 
Philadelphia, Pennsylvania, for a type of > 
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> blindness is considered the most advanced. 

Many of the treatments deliver corrective 
genes using a modified virus that is considered 
safer than vectors used in earlier attempts. But 
many of the target disorders are rare, limiting 
the population that can be treated. And there 
are often no previously approved drugs that 
work similarly, removing the pressure on com- 
panies to lower their prices. 

Such therapies could cost $1 million per 
patient, estimate haematologist Stuart Orkin 
of Harvard Medical School in Boston, Massa- 
chusetts, and Philip Reilly, an investor with 
Third Rock Ventures in Boston (S. H. Orkin 
and P. Reilly Science 352, 1059-1061; 2016). 
Reilly co-founded Cambridge-based Bluebird 
Bio, which is working on several of the gene 
therapies that are close to market. 

That's the same price as Glybera, the gene 
therapy given the green light by European reg- 
ulators in 2012, which has been taken by only 
one person so far. Experts attribute this low 
uptake to the high price and to doubts about its 
efficacy. If newer gene therapies are to do bet- 
ter, they will have to produce convincing data 
that they are worth the money, Trusheim says. 

For medicines that are already approved, 
one increasingly popular solution is a deal 
between insurers and drug companies that 
ties payments to how well medicines perform. 
Last November, for example, Boston-based 


Harvard Pilgrim Health Care, a major New 
England insurer, announced that it will cover 
treatment for its clients with Repatha (evo- 
locumab), one of a new class of cholesterol- 
lowering medication that is made by Amgen 
and costs $14,000. But if patients don't reach 
pre-agreed cholesterol levels, or if Harvard 
Pilgrim ends up paying more than it has 
budgeted for, Amgen will refund the insurer. 


. Networks set up by 
“Advances in insurance companies 
science are to gather and share 
presenting data from health cen- 
asocial tres make such deals 
affordability possible, says Michael 
question like Sherman, chief medi- 


cal officer at Harvard 
Pilgrim. And they are 
on the rise around the world: one study found 
‘pay-for-performance’ deals across 14 coun- 
tries in 2013, predominantly in Europe and 
the United States, but also in middle-income 
countries such as China and Brazil. 

These deals may work for some conditions, 
such as haemophilia B, for which several 
drugs might be approved. But for others, such 
as adrenoleukodystrophy, only one company 
is developing a product, so there won't be 
the incentive for companies to negotiate, 
Trusheim says. 

At the BIO meeting, investors and economists 


never before.” 


discussed a range of alternative solutions, 
including the medical equivalent of a mortgage 
or annuity, in which insurance companies or 
governments might spread the cost of a one- 
time treatment over many years, as long as a 
patient continues to benefit from it. One com- 
plication of such arrangements in the United 
States is that patients often move between insur- 
ers, So itis unclear who would continue to make 
these payments on a patient's behalf. 

The difficulties of paying for the fruits of the 
biotechnology revolution are something that 
governments are already struggling with. The 
state of Arkansas last year settled a lawsuit filed 
by three people who said they had been denied 
access to the $300,000 cystic fibrosis drug 
Kalydeco (ivacaftor) because of the cost. And in 
April, the Japanese government imposed a 50% 
price cut on anew hepatitis C treatment, Sovaldi 
(sofosbuvir). A US federal judge in Seattle, 
Washington, ruled on 27 May that states 
cannot delay treatment with Sovaldi, which 
costs up to $84,000, because of price concerns. 

But those working on gene therapy are con- 
fident that a solution is out there. “Let's say that 
a gene therapy that really made a world of dif- 
ference in the life of a small child should cost 
a million dollars for one event,’ Reilly says. “I 
can think of many things in medicine that cost 
that much or more, and we don’t think twice 
about that.” = 
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EU SCIENCE NEWS 


The EU affects science from the collaborative opportunities that the bloc creates to the billions of euros that it distributes for research and innovation. 


EUROPEAN UNION 


Boon or burden: what has 


the EU ever done for science? 


More than 500 million people and 28 nations make up the European Union. It will lose one of 
its richest, most populous members, if the United Kingdom votes to leave on 23 June. Ahead of 
apossible ‘Brexit’, Nature examines five core ways that the EU shapes the course of research. 


SCIENTIST SUPERHIGHWAY 


Science doesn't respect national boundaries, so 
it helps if scientists don't have to either — and 
EU rules and programmes encourage research- 
ers to hit the road. 

EU citizens have the right to live and work 
in any country in the bloc, and the European 
Commission's Marie Sktodowska-Curie actions 
pay for 9,000 scientists each year to move to 
or within the EU. The actions fill a gap left by 
national funders, which are often reluctant to 
fund researchers outside their country, says 
Caroline Whelan, a senior scientific officer at 
Science Europe, the Brussels-based organiza- 
tion of national research councils. The EU 
Erasmus exchange programme has transplanted 
more than 3.3 million students, and 470,000 
teaching and administrative staff, since 1987. 

Although there is little information on 
how such programmes affect scientists’ over- 
all mobility, they boost opportunities for 


collaboration. And because Marie Sktodowska- 
Curie fellows often return to their home coun- 
try, they redistribute skills and knowledge. 
“This is fantastic for Eastern Europe and other 
less-well developed countries to build research 
capacity,’ says Lidia Borrell-Damian, director 
for research and innovation at the European 
University Association in Brussels. 

A 2011-13 study found that 31% of EU aca- 
demics had worked outside their country of 
residence in the previous decade. And lead- 
ing scientists say that hiring from abroad helps 
them to respond to local skills shortages. The 
survey also found that 80% of those who had 
worked internationally saw a positive effect 
on their research skills, and 60% thought that 
mobility had strongly increased their research 
output (see go.nature.com/28wvqta). 

But the experiences were not all positive: 
more academics said that their job options had 
decreased as a result of moving than said that 
opportunities had increased, for example. 
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Another downside of mobility is that much 
of the flow goes just one way, says Maria Helena 
Nazaré, a physicist and former rector of the 
University of Aveiro in Portugal. “T think that’s 
already creating problems.’ Countries such 
as the United Kingdom, the Netherlands and 
Sweden tend to be net attractors for the Marie 
Sktodowska-Curie actions, whereas Spain, 
Greece and Italy lose talent. Nazaré also notes 
that transferring pension and benefits between 
countries can be tough. 

Still, the commission is committed to further 
greasing the wheels. Funding aimed at encour- 
aging mobility has soared in the past two dec- 
ades to €6.2 billion (US$6.9 billion) in 2014-20 
— and the commission is tackling the pensions 
issue. It is also growing its EURAXESS portal, 
an EU-wide website that lists jobs and support 
for moving researchers, and has revamped its 
‘scientific visa’ package for non-EU researchers. 
Notably, the United Kingdom has opted out of 
the visa, together with Denmark. > 
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EU SPENDING 


Research and 


innovation 


Structural 


funds for 


The European Union has dedicated more than €120 billion (almost 13%) of its 2014-20 
budget to research and innovation (R&l). A host of other EU-funded programmes also support 
or are connected to R&l activities, but don’t define the amount of their investment. 


Programmes with 
undefined funds 


TOTAL Qc) ...C R&l for R&I 
ITER : : 
€ HW] 60 ___ (thermonuclear e __ Copernicus : Health 
billion fusion project) © €43bn : €0.4 bn 
: €3 bn | z : 
Horizon Building innovation élite Satellite aes , 
2020 capacity, mainly in e-— e71b monitoring for e— ee clk 
€74.8 bn least-developed evan environment : erpr 
«Euratom areas of the EU and security €2.3 bn 
€1.6 bn Y ~€44 bn fie 
| (from larger .— 
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‘Excellent science’ 


€24.2 bn 


‘Industrial leadership’ 


€16.5 bn 


European Institute of | 
Innovation & Technology 


€2.4bn 


T 
Marie Sktodowska- 
Curie actions 


€6.2 bn 


T 
European Research 
Council 


€13.1 bn 


‘Societal challenges’ 


€28.6 bn 


A Europe-wide agency to fund high-risk, high-impact research; half 
of its funds have gone to the United Kingdom, Germany and France 


UNIQUE SCIENCE 


Scientists like to complain loudly about some 
aspects of the commission’s ‘Framework’ 
funding programmes, which are dedicated to 
research and innovation (see ‘EU spending’). 

To access a vast pot of cash geared to meet- 
ing ‘Societal Challenges’ — which amounts to 
an estimated €28.6 billion of the €74.8 billion 
available under Horizon 2020, the Framework 
programme for 2014-20 — they must meld 
themselves into large multinational collabora- 
tions, and adjust their research to fit EU stra- 
tegic goals. But these constraints have fostered 
many valuable projects. 

“Tam a big fan of these programmes,” says 
Nadia Rosenthal, scientific director of the Jack- 
son Laboratory in Bar Harbor, Maine, who 
has collaborated with several EU consortia on 
mouse-genetics projects, which she says gen- 
erated world-class science. “The coordination 
of talents they can achieve would be very hard 
to pull off in the United States — or in the UK 
alone, if it were not connected to Europe.’ 

Take research into the health effects of low- 
dose radiation, which people may encounter 
during a CT scan or if they live within a few 
tens of kilometres of the site of the Fukushima 
disaster in Japan. So small are the risks — if they 
exist at all — that such research is low on most 
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funding agencies list of priorities. 

But the issue is of perennial concern to the 
public. And studying it requires collaboration 
between radiation-protection agencies and 
academics, as well as the use of large data sets, 
which can be gathered only by multiple collabo- 
rating nations. 

These factors make low-dose-radiation stud- 
ies perfect fodder for EU funding, says Thomas 
Jung, head of radiation protection and health at 
the German Federal Office of Radiation Protec- 
tion in Munich, which has participated in the 
series of low-dose-radiation projects that the 
commission has supported since 2010. 

Societal Challenges funding has also sup- 
ported projects that others shy away from, such 
as transplanting cells derived from the brains 
of fetuses into the brains of people with Par- 
kinson’s disease. In 2003, researchers around 
the world abandoned this controversial line of 
research — which tries to replace the neurons 
whose loss causes the illness's symptoms — after 
many trial participants failed to benefit and no 
one could work out why. Then, in 2014, the 
commission-funded TRANSEURO trial began. 

TRANSEURO aims to transplant neurons 
into 150 people with Parkinson’s in the United 
Kingdom, Sweden, France and Germany using 
harmonized clinical protocols to help establish 
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Other 
€3.1 bn 


which conditions work best. The large collabo- 
ration, which joins 14 biomedical laboratories, 
clinics and companies, is essential, says TRANS- 
EURO’ coordinator, neurologist Roger Barker 
at the University of Cambridge, UK. “Without 
the EU, I doubt this would have happened” 

Trust between companies is crucial to 
the Advanced Immunization Technologies 
(ADITEC) project, which aims to create a 
generic toolbox to speed up vaccine develop- 
ment. Under the confidentiality agreements 
of the consortium, which the commission has 
funded since 2011, companies are comfortable 
sharing the components of their proprietary 
vaccines. The project has already produced the 
first direct comparison of different companies’ 
‘adjuvants, substances that strengthen immune 
responses (N. P. H. Knudsen et al. Sci. Rep. 6, 
19570; 2016). “We had always thought it would 
be impossible to compare them,’ says ADITEC 
coordinator Rino Rappuoli, chief scientist of 
GSK Vaccines in Siena, Italy. 


LIFTED THE EAST 


In late 2000, when NATO sponsored a meeting 
on science in Central and Eastern Europe, much 
of the region was a world apart from the EU. 
Years of communist thinking had nourished 
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the illusion that the mere existence of institutes 
and research facilities was more important than 
their actual performance. 

Attitudes have changed, partly thanks to 
the EU, which absorbed the Czech Republic, 
Estonia, Hungary, Latvia, Lithuania, Poland, 
Slovakia and Slovenia in 2004, then Bulgaria 
and Romania in 2007 and Croatia in 2013. 

These countries have had a low rate of suc- 
cess in winning grants from the Framework 
programmes. But all of the former communist 
states are recipients of the commission's ‘struc- 
tural funds’ — subsidies designed to reduce 
social and economic disparities, a goal of the 
EU. How the funds are used is decided locally, 
but of the €170 billion available for ‘cohesion 
and regional development’ in 2007-13, the 
commission pushed for €20 billion to be spent 
on research. In 2014-20, almost €44 billion is 
meant to be used for science and innovation in 
poorer regions. 

The cash has been most effective when used 
to refurbish universities and provide labs with 
the equipment needed to train students and 
entice researchers to stay, says Peter Tindemans, 
secretary-general of science-advocacy group 
EuroScience in Strasbourg, France. 

The funds have also financed the 
€850-million Extreme Light Infrastructure, a 
pan-European laser facility under construc- 
tion at sites in the Czech Republic, Hungary and 
Romania. The facility is expected to attract lead- 
ing talent from around the world to the region, 
but Tindemans cautions that improvements to 
the research environment must come first. “You 
cant jump-start scientific development solely 
with large infrastructures, he says. 


rUOIT ENING EAGELLENGE 
To win cash from EU funding programmes, 
researchers must often fit their work into 
broader societal or economic goals. But one 
corner of the European funding apparatus is all 
about science for science’s sake. 

Set up in 2007 to raise the quality of research 
across Europe, the European Research Council 
(ERC) awards generous grants that are open to 
any discipline, come with minimum bureau- 
cracy and are judged solely on the quality of 
the application. 

The ERC budget has grown from €7.5 billion 
in 2007-13 to €13.1 billion for 2014-20. At up 
to €2.5 million over 5 years per researcher, its 
grants are longer and larger than those of most 
national funders. The approach seems to work: 
7% of ERC-generated papers come in the top 
1% of the most highly cited articles by disci- 
pline, publication type and year. 

Not everyone is happy with the ‘excellence at 
all costs’ approach. Since the ERC’s inception, 
half of the grants it awarded under its three core 
schemes have gone to just three countries: the 
United Kingdom, Germany and France. 

But the ERC system lifts the quality of 
research beyond the projects that it funds. 


EUROPEAN, 


EU SCIENCE 


BUT NOT EU 


Although separate, CERN and ESA receive EU funds. 


Before the EU began to have a major role in 
coordinating Europe-wide research in the 
1990s, the task fell mainly to pan-European 
research organizations such as the CERN 
particle-physics laboratory. 

Established by treaty in 1952 by 
11 countries, CERN, near Geneva, 
Switzerland, was born in the same post- 
war spirit of peace as led to the formation 
of the EU. But the lab pre-dates the EU’s 
main forerunner, the European Economic 
Community, which had no remit for 


Rosetta’s Philae 
lander touches 
down ona comet. 


Either in an attempt to win more of its grants 
or simply inspired by the ERC, member states 
are redesigning national policies to make their 
science more competitive, says Jose Labastida, 
head of the ERC’s scientific department. He 
cites Poland’s National Science Centre, set up 
in 2011, as an example. 

And 17 countries have run schemes that 
fund ERC runners-up — applicants who met 
the quality threshold but were unsuccess- 
ful — essentially reusing the agency’s high- 
quality peer-review process. “The ERC has 
raised the scientific level all over Europe,” 
says Catherine Cesarsky, an astronomer at the 
French Atomic Energy Commission near Paris. 


Science thrives on collaboration — and the EU 
has partnered with other agencies (see ‘Euro- 
pean, but not EU’) and creates myriad opportu- 
nities for researchers to pool ideas and cooperate. 

Most of the funding for the EU’s Framework 
programmes is reserved for projects in which 
partnerships are formed by at least three organi- 
zations from different countries. The last pro- 
gramme, FP7, which ran from 2007 to 2013, 
spent €41.7 billion of its €50.5-billion budget 
on some 26,000 joint projects, generating 
more than 500,000 pairs of collaborative links 
between research organizations, according to 
the commission. The Framework programmes 
also fund mobility grants that foster collabora- 
tion. 

In less-well-off countries, meanwhile, 
structural funds equip researchers to work 
with their counterparts in more scientifically 


© 2016 Macmillan Publishers Limited. All rights reserved. 


research, by about five years. CERN now has 
21 member states and is a major recipient 
of EU funds, including for a 2020 upgrade 
of its Large Hadron Collider, which scientists 
used to discover the Higgs boson. 

Another organization that grew up 
alongside the EU is the European Space 
Agency (ESA). It arose from a 1975 merger 
between the European Space Research 
Organisation and the European Launch 
Development Organisation. Both were 
created in the 1960s to guarantee Europe 
independent access to space. 

ESA has racked up a string of successes, 
including the Rosetta mission that puta 
lander on a comet in 2014. The EU is now the 
biggest single contributor to the 22-nation- 
strong agency, accounting for some 20% 
of its budget. ESA and the EU are partners 
in the multibillion-dollar Copernicus Earth 
observation system and in the Galileo global 
satellite navigation system. 


developed nations, says Rémi Barré, an emeri- 
tus researcher at the National Conservatory of 
Arts and Crafts in Paris. 

The gradual political, economic and research 
integration of the EU’s member states has 
created an environment that is conducive to 
collaboration, according to geneticist Paul 
Nurse, head of the Crick Institute in London. 
Research is now embedded across the EU’s 
activities, from the bloc’s negotiation of the 
COP21 climate accord in December 2015 to its 
environmental-protection policies and regula- 
tory bodies such as the London-based European 
Medicines Agency. 

Contact between science ministers from 
different member states and researchers has 
become the norm, says Frank Gannon, for- 
mer head of the intergovernmental European 
Molecular Biology Organization. By con- 
trast, he recalls how fragmented European 
research was a few decades ago when he was a 
researcher in Ireland. “The sense of isolation of 
a researcher was massive.” 


Reporting by Alison Abbott, Declan Butler, 
Elizabeth Gibney, Quirin Schiermeier and 
Richard Van Noorden 


The News Feature ‘The material code’ 
(Nature 533, 22-25; 2016) did not make 

it clear that the director of the Materials 
Genome Project is Kristin Persson, and that 
she has an affiliation with the University of 
California, Berkeley. 
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A DECADE OF 


IPS CELLS 


Induced pluripotent stem cells were supposed to herald a medical revolution. 
But ten years after their discovery, they are transforming biological research instead. 


BY MEGAN SCUDELLARI 


44 W e have colonies.’ 

Shinya Yamanaka looked up in surprise at the 
postdoc who had spoken. “We have colonies,” 
Kazutoshi Takahashi said again. Yamanaka jumped from his desk and 
followed Takahashi to their tissue-culture room, at Kyoto University 
in Japan. Under a microscope, they saw tiny clusters of cells — the 
culmination of five years of work and an achievement that Yamanaka 

hadn't even been sure was possible. 

Two weeks earlier, Takahashi had taken skin cells from adult mice and 
infected them with a virus designed to introduce 24 carefully chosen 
genes. Now, the cells had been transformed. They looked and behaved like 
embryonic stem (ES) cells — ‘pluripotent’ cells, with the ability to develop 
into skin, nerve, muscle or practically any other cell type. Yamanaka gazed 
at the cellular alchemy before him. “At that moment, I thought, “This must 
be some kind of mistake?” he recalls. He asked Takahashi to perform the 
experiment again — and again. Each time, it worked. 

Over the next two months, Takahashi narrowed down the genes to 
just four that were needed to wind back the developmental clock. In June 
2006, Yamanaka presented the results to a stunned room of scientists at 
the annual meeting of the International Society for Stem Cell Research 
in Toronto, Canada. He called the cells ‘ES-like cells, but would later refer 
to them as induced pluripotent stem cells, or iPS cells. “Many people just 
didn't believe it; says Rudolf Jaenisch, a biologist at the Massachusetts 
Institute of Technology in Cambridge, who was in the room. But Jaenisch 
knew and trusted Yamanaka’s work, and thought it was “ingenious”. 

The cells promised to be a boon for regenerative medicine: researchers 
might take a person’s skin, blood or other cells, reprogram them into iPS 
cells, and then use those to grow liver cells, neurons or whatever was 
needed to treat a disease. This personalized therapy would get around 
the risk of immune rejection, and sidestep the ethical concerns of using 
cells derived from embryos. 

Ten years on, the goals have shifted — in part because those therapies 
have proved challenging to develop. The only clinical trial using iPS cells 
was halted in 2015 after just one person had received a treatment. 

But iPS cells have made their mark in a different way. They have 
become an important tool for modelling and investigating human 
diseases, as well as for screening drugs. Improved ways of making the 
cells, along with gene-editing technologies, have turned iPS cells into a 
lab workhorse — providing an unlimited supply of once-inaccessible 
human tissues for research. This has been especially valuable in the fields 
of human development and neurological diseases, says Guo-li Ming, a 
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neuroscientist at Johns Hopkins University in Baltimore, Maryland, 
who has been using iPS cells since 2006. 

The field is still experiencing growing pains. As more and more labs 
adopt iPS cells, researchers struggle with consistency. “The greatest chal- 
lenge is to get everyone on the same page with quality control,” says 
Jeanne Loring, a stem-cell biologist at the Scripps Research Institute 
in La Jolla, California. “There are still papers coming out where people 
have done something remarkable with one cell line, and it turns out 
nobody else can do it,” she says. “We've got all the technology. We just 
need to have people use it right.” 


FROM SKIN TO EYES 

Six weeks after presenting their results, Yamanaka and Takahashi pub- 
lished' the identities of the genes responsible for reprogramming adult 
cells: Oct3/4, Sox2, Klf4 and c-Myc. Over the next year, three laboratories, 
including Yamanaka’, confirmed the results and improved the repro- 
gramming method” ~*. Within another six months, Yamanaka and James 
Thomson at the University of Wisconsin-Madison managed to repro- 
gram adult cells from humans”®. Labs around the world rushed to use the 
technique: by late 2009, some 300 papers on iPS cells had been published. 

Many labs focused on working out what types of adult cell could be 
reprogrammed, and what the resulting iPS cells could be transformed 
into. Others sought to further improve the reprogramming recipe, 
initially by eliminating’ the need to use c-Myc, a gene with the potential 
to turn some cells cancerous, and later by delivering the genes with- 
out them integrating into the genome, a looming safety concern for 
iPS-cell-based therapies. 

Another big question was how similar iPS cells really were to ES cells. 
Differences started to emerge. Scientists discovered’ that iPS cells retain 
an ‘epigenetic memory’ — a pattern of chemical marks on their DNA 
that reflects their original cell type. But experts argue that such changes 
should not affect the cells’ use in therapies. “There might be some differ- 
ences from ES cells, but I believe they are really not relevant?’ says Jaenisch. 

By 2012, when Yamanaka won half of the Nobel Prize in Physiology or 
Medicine for the work, the first human trial of an iPS-cell-based therapy 
was being planned. Masayo Takahashi, an ophthalmologist at the RIKEN 
Center for Developmental Biology (CDB) in Kobe, Japan, had been 
developing ES-cell-based treatments for retinal diseases when Yamanaka 
first published his reprogramming method. She quickly switched to iPS 
cells, and eventually began to collaborate with Yamanaka. 

In 2013, her team made iPS cells from the skin cells of two people 
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ILLUSTRATION BY ANDY POTTS; PHOTO: CHRIS GOODFELLOW/GLADSTONE INST. 


Shinya Yamanaka won a Nobel prize for his work on reprogramming adult cells to an embryonic-like state. 


with age-related macular degeneration, an eye condition that can lead to 
blindness, and used them to create sheets of retinal pigment epithelium 
(RPE) cells fora clinical trial. Not long after, CDB researchers working on 
another cell-reprogramming technique — stimulus-triggered acquisition 
of pluripotency, or STAP — came under investigation for misconduct (see 
go.nature.com/1xbnlzn). Although unconnected to the iPS-cell trial, the 
furore made it difficult for Takahashi to advance her study: it created a 
“headwind in the calm sea” in which she had been working, she says. Yet 
her team pushed ahead, and on 12 Septem- 
ber 2014, doctors implanted the first RPE 
sheets into the right eye of a woman in her 
seventies. Takahashi says that the therapy 
halted the woman’s macular degeneration 
and brightened her vision. 

But as the lab prepared to treat the sec- 
ond trial participant, Yamanaka’s team 
identified two small genetic changes in 
both the patient’s iPS cells and the RPE 
cells derived from them. There was no evi- 
dence that either mutation was associated 
with tumour formation, yet “to be on the safe side” Yamanaka advised 
Takahashi to put the trial on hold. She did. 

The suspension gave pause to other researchers interested in the field, 
says Paul Knoepfler, a stem-cell biologist at the University of California, 
Davis: “The world is watching to see how it progresses.” But the difficul- 
ties iPS cells have faced getting to the clinic aren’t that unusual, says 
David Brindley, who studies stem-cell regulation and manufacturing at 
the University of Oxford, UK. It generally takes about 20 years to move a 
scientific discovery to clinical and commercial adoption, so iPS cells “are 
following roughly the same trajectory’; he says. 

In the United States, the Astellas Institute for Regenerative Medicine 
in Marlborough, Massachusetts (formerly Advanced Cell Technology), 
has several iPS-cell-based therapies in its pipeline, including ones for 
macular degeneration and glaucoma, says chief scientific officer Robert 
Lanza. For any such therapy, it takes years to work outa suitable method 
for making the right cell types in large enough quantities, and with 
enough purity. “iPS cells are the most complex and dynamic therapies 
that have ever been proposed for the clinic; says Lanza. “I’m the first 
one who wants to see these cells in the clinic, but an abundance of cau- 
tion is needed.” 

The other great challenge is working out what will be required to get 


“THE WORLD IS 
WATCHING TO 
SEE HOW IT 
PROGRESSES.” 


such treatments approved. Loring hopes to start an iPS-cell-therapy trial 
for Parkinson's disease in the next two years. But it won't be easy: the treat- 
ment uses cells derived from individual patients, and Loring plans to doa 
complex series of checks and validations for each cell line to demonstrate 
its safety to the US Food and Drug Administration. 

Developing and testing a therapy in even one person has been edu- 
cational, says Yamanaka: it took one year and US$1 million. He expects 
future therapies to use donor-derived iPS cells from a cell bank, rather 
than making them for each patient. 

Takahashi plans to compare banked 
iPS cells side-by-side with those derived 
from patients, to observe any differences 
in immune reaction. She intends to apply 
to the Japanese government to resume her 
macular-degeneration trial “very soon’, but 
when asked, would not specify a timeline. 


CELLULAR IMPROVEMENTS 

Although cell therapy has suffered set- 
backs, other areas of research have blos- 
somed. Methods for making iPS cells “are more refined and elegant then 
they were even five years ago’, says Knoepfler. 

But most reprogramming techniques are inefficient: only a small frac- 
tion of cells end up fully reprogrammed. And, like all cell lines, iPS cells 
vary from one strain to another. That has made it hard to establish controls 
in experiments. 

Marc Tessier-Lavigne, a neuroscientist at the Rockefeller University 
in New York City, confronted this challenge with colleagues at the New 
York Stem Cell Foundation when they began to work with iPS cells made 
from people with early-onset Alzheimer’s disease and frontotemporal 
dementia. They quickly realized that comparing a patient's iPS cells with 
those from a healthy control didn’t work — the cells behaved too differ- 
ently in culture, probably the result of disparities in genetic background 
or gene expression. “So we turned to gene editing,” says Tessier-Lavigne. 

The CRISPR-Cas9 gene-editing tool, which has gained huge popular- 
ity in recent years, has enabled researchers to introduce disease-associ- 
ated mutations into a sample of iPS cells and then compare them with 
the original, unedited cell lines. Jaenisch’s lab uses CRISPR-Cas9 with 
iPS cells daily. “We can do any manipulations we want to do,’ he says. 

New, refined gene-editing methods are proving even more useful. In 
April, for example, Dominik Paquet and Dylan Kwart in Tessier-Lavigne’s 
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INDUCING A 
REVOLUTION 


Shinya Yamanaka’'s discovery spurred thousands 
of publications on the identity, characteristics and 
many uses of iPS cells in research. 


(e)}- Yamanaka, of Kyoto University in Japan, reveals that 
just four genes can reprogram adult mouse cells 
into embryonic-like, ‘pluripotent’ iPS cells. 


(e}- Yamanaka and James Thomson at the University of 
Wisconsin—Madison both report creation of human iPS cells. 


Skin cells from people 
with Parkinson’s disease 
are transformed into 
dopamine-producing 
neurons in a bid to model 
the disease in a dish. 


Several teams start to demonstrate that iPS cells can 
be created without inserting genes into the genome. 


Yamanaka (pictured with King Carl XVI Gustaf of Sweden) 
and John Gurdon at the University of Cambridge, UK, receive 
the Nobel Prize in Physiology or Medicine for revealing that 
adult cells can be reprogrammed. 


(e)}- Researchers in Japan begin the first, and so far only, 
test of iPS-derived cells in people, attempting to 
treat a degenerative eye condition. 


(e}- The Japanese trial is halted. 
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lab demonstrated’ a technique for introducing specific point mutations 
into iPS cells using CRISPR, and editing just one copy ofa gene, rather 
than both. This allowed them to generate cells with precise combinations 
of Alzheimer’s-associated mutations, and to study the effects. 

But because iPS cells resemble embryonic cells, they are not always 
ideal for studying late-onset diseases such as dementia. So researchers 
are exploring ways to stress cells or introduce proteins that age them 
prematurely. “It’s a valid concern that hasn't been resolved, but there are 
a number of approaches to really try to tackle it,” says Tessier-Lavigne. 

The fact that iPS cells mimic early human development has proved 
useful in another field — the sprint to discover whether and how infec- 
tion with the Zika virus in pregnant women might lead to microcephaly, 
a condition in which a baby’s head is smaller than expected. Ming and 
her colleagues have used iPS cells to create brain organoids — 3D bits 
of tissue that resemble developing organs. When they exposed these to 
Zika, they found” that the pathogen preferentially infects neural stem 
cells over newly formed neurons, leading to increased death of the neural 
stem cells anda decrease in the volume ofa layer of neurons in the cortex, 
resembling microcephaly. 

Other groups have used iPS cells to create organoids such as mini-guts 
and mini-livers, and the list of disease-related discoveries using iPS cells 
is growing. It includes showing how a gene duplication in glaucoma 
causes the death of nerve-cell clusters"’, and recapitulating genetic and 
cellular alterations associated with Huntington’ disease’. 

iPS cells have also been used with some success in drug discovery: 
they provide a plentiful source of patient-derived cells to screen or test 
experimental drugs. In 2012, for example, neural stem cells made from 
people with a nerve-cell-development disease were used to screen nearly 
7,000 small molecules and identify a potential drug for the condition”. 
And this year, a team reported generating sensory neurons from iPS 
cells made from people with an inherited pain disorder. The researchers 
showed that a sodium-blocking compound reduced the excitability of 
neurons and decreased pain in the patients. It would be great to use iPS 
cells to predict whether people will respond to a particular drug, says 
Edward Stevens, a research fellow at the Pfizer Neuroscience and Pain 
Research Unit in Cambridge, UK, who led the work, but there will need 
to be much more evidence that such a strategy works. 

Even after a decade of reprogramming cells (see ‘Inducing a revolu- 
tion’), researchers don't know in detail how the process actually occurs. 
For now, the field is focused on systematically verifying cell lines’ iden- 
tity and safety, by checking their genomes, gene-expression patterns 
and more. One such effort, the European Bank for Induced Pluripo- 
tent Stem Cells, centred in Cambridge, UK, publicly launched its cata- 
logue of standardized iPS cells for use in disease modelling this March. 
Yamanaka is also involved in banking iPS cells for future therapies, col- 
lecting varieties that would be immunologically compatible across a 
broad population. 

The greatest future challenges, he says, are not scientific. Researchers 
are going to need strong support from the pharmaceutical industry and 
governments to move forward with cell therapies; for drug discovery and 
disease modelling, researchers must be persistent and patient. iPS cells 
can only shorten the discovery process, not skip it, he says. “There’s no 
magic. With iPS cells or any new technology, it still takes along time.” m 


Megan Scudellari is a science journalist in Boston, Massachusetts. 
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New tricks 
for old drugs 


Faced with skyrocketing costs for developing 
new drugs, researchers are looking at ways 
to repurpose older ones — and even some that 
failed in initial trials. 


BY NICOLA NOSENGO 


hen a young physician opted to doa 
We stint in Grant Churchill's phar- 
macology lab as part of his medical 
training, he asked for a task that would quickly 
teach him the tools of the trade. “So I thought, 
‘Thave a good project for you,’ says Churchill. 
That was in 2010, and Churchill's group at 
the University of Oxford, UK, was looking for 
ways to treat bipolar disorder without using 
lithium — a drug that often works well, but is 
plagued with side effects. So Churchill asked the 
physician, Justyn Thomas, to screen all of the 
450 compounds in the US National Institutes of 
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Health (NIH) Clinical Collection, a library of 
drugs that had passed safety tests in humans 
but, for various reasons, had never reached the 
market. “That stuff is just sitting there, and it 
doesn't take much effort,’ says Churchill, “so you 
think you just have to try.” 

Thomas pipetted a few drops of each 
compound into Petri dishes filled with bacteria 
that had been genetically engineered to manu- 
facture the human enzyme suppressed by lith- 
ium — and eventually got a hit. A compound 
originally intended for people who had experi- 
enced a stroke also damped production of the 
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enzyme, suggesting that it might give patients § 
the same benefits as lithium’. After experiments I 
in mice showed that the drug, ebselen, could get & 
through the chemical barrier that protects the * 
brain — something only a few compounds can 
do — Churchill's group did a small-scale trial 
and found that ebselen could be used safely in 
healthy volunteers’. 

The University of Oxford has now teamed up 
with a pharmaceutical company to run clini- 
cal trials of ebselen for bipolar disorder. The 
researchers are able to skip the phase I safety 
trials because the drug had already passed 
them, and are going straight to phase II: test- 
ing the drug's efficacy against bipolar disorder. 
Churchill is well aware that ebselen could fail 
this trial, or the larger, more stringent ones 
needed to test whether the drug works better 
than lithium. But he is already proud of what 
his team has achieved. “As an academic group 
with no company money, he says, “we were able 
to go from identification of the molecule to a 
human trial with a very limited budget” 

Such stories are becoming more and more 
common: taking drugs that have been devel- 
oped for one disorder and ‘repositioning’ them 
to tackle another is an increasingly important 
strategy for researchers in industry and aca- 
demia alike. These efforts take inspiration from 
some classic success stories. One is sildenafil, 
an angina medication developed in 1989 that 


geod 


is now marketed as Viagra and used to treat 
erectile dysfunction. Another is azidothymi- 
dine, which failed as a chemotherapy drug but 
emerged in the 1980s as a therapy for HIV. 

Increasingly, the serendipity responsible 
for those earlier discoveries is giving way to 
systematic searches for candidates. Partly, this 
is the result of advances in technology. These 
include big-data analytics that can now uncover 
molecular similarities between diseases; com- 
putational models that can predict which 
compounds might take advantage of those 
similarities; and high-throughput screening 
systems that can quickly test many drugs against 
different cell lines. 

But for the pharmaceutical industry, the real 
impetus is economics. Getting a drug to mar- 
ket currently takes 13-15 years and between 
US$2 billion and $3 billion on average, and the 
costs are going up — even though the number 
of drugs approved every year per dollar spent 
on development has remained flat or decreased 
for most of the past decade’ (see ‘Eroom’s law’). 
The 3,000 or so drugs that have been approved 
by at least one country therefore represent a vast 
untapped resource if they can be used against 
another condition — as do the thousands more 
that stalled in clinical trials. Many of them, like 
ebselen, can probably skip the phase I trials and 
pose a substantially lower risk of producing 
dramatic side effects in later phases — thereby 


slashing those development costs compared 
with completely new compounds. Some esti- 
mates suggest that repositioning a drug costs 
on average $300 million and takes around 
6.5 years. “My feeling is that the proportion of 
drugs that in theory could be repositioned is 
probably around 75%,’ says Bernard Munos, 
a senior fellow at FasterCures, a drug-devel- 
opment advocacy organi- 
zation in Washington DC, 
and a member of the advi- 
sory council of the National 
Center for Advancing 
Translational Sciences 
(NCATS) at the NIH. 

But the fraction is prob- 
ably quite a bit smaller 
in practice, he concedes. 
Repositioned drugs still 
have to make it through 
phase II and Il clinical tri- 
als for their new purpose 
— trials that respectively 
eliminate 68% and 40% of 
every compound that gets 
that far. And many drugs 
also face economic barriers, such as being off- 
patent, that could dissuade pharmaceutical 
companies from getting involved. “Can some 
repositioning projects work? Sure. Can it work 
systematically as a profitable business model? 
That, I don't believe,” says John LaMattina, 
a former president of research and develop- 
ment at Pfizer, and now a senior partner at the 
health-care technology research firm PureTech 
in Boston, Massachusetts. 

Nonetheless, some 30 articles on cases of drug 
repositioning are now being published in scien- 
tific journals every month — a sixfold increase 
since 2011. A dedicated journal, Drug Repurpos- 
ing, Rescue and Repositioning, was launched last 
year. Three or four drug-repositioning compa- 
nies are created every year. And some estimates* 
suggest that the number of repositioned drugs 
entering the regulatory-approval pipeline is 
rising, and could account for about 30% of all 
drugs approved every year. 

“We've gone past the stage where we had 
to explain to everyone what we were talking 
about,’ says Andreas Persidis, chief executive 
of Biovista in Charlottesville, Virginia, one 
of about 40 companies that now specialize in 
drug repositioning. “Nowit’s a recognized field, 
and were in the typical second stage of scien- 
tific trends, when lots of people jump on the 
bandwagon” 


STARTING POINT 

The easiest target for repositioning is generic 
drugs. They have been on the market for 
years, their safety profiles are well known and 
they are easy and cheap to obtain for clini- 
cal trials because their original patents have 
expired. And, if they involve new formula- 
tions or applications to new disorders, they can 
still be covered by patents or be granted three 
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“My feeling 
is that the 
proportion of 
drugs that in 
theory could 
be repositioned 
is probably 
around 75%.” 
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years of market exclusivity by the US Food and 
Drug Administration (FDA). So they remain 
attractive targets for companies. 

Biovista, for example, starts by automati- 
cally scanning through all the publicly avail- 
able information on generic compounds, from 
scientific papers and patents to the database 
of adverse events compiled by the FDA. Then 
it creates a kind of cellular 
social network, mapping 
all the connections that it 
has found between drugs, 
molecular pathways, genes 
and other biologically rel- 
evant entities. The thinking 
is that the more connections 
that a drug has in common 
with a disease, the more 
likely it is to be a good can- 
didate for repositioning. 

This is how Biovista 
discovered that pirlindole 
— a generic antidepressant 
that was developed and is 
used in Russia — might be 
a potential treatment for 
multiple sclerosis. In mouse models’, the drug 
slows down the progression of the disease, and 
is now about to progress to a proof-of-concept 
study in humans. The company has secured a 
new patent on pirlindole, as well as on another 
candidate treatment for multiple sclerosis, still 
another for epilepsy and three for cancer. 

Another source of knowledge is what doctors 
see in the clinic. “Every drug that’s been around 
for some years has about 20 off-label uses, two- 
thirds of which are started by practising phy- 
sicians,’ says Moshe Rogosnitzky, who heads 
one of the first academic centres for drug repo- 
sitioning, established last year at Ariel Univer- 
sity in Israel. “But the other doctors don’t know 
about them, because clinicians have a hard time 
publishing their results.’ 

So Rogosnitzky and his group systemati- 
cally canvas these practitioners in Israel and 
12 other countries, try to work out a mechanism 
of action for each reported effect and help the 
physicians to get patent protection and attract 
money for further trials. They also help more 
people to get the drug on an off-label basis. Next 
July, the group will start a phase II trial to reposi- 
tion a generic angina drug, called dipyridamole, 
to treat dry eye disease, a frequent complication 
for people who have undergone bone-marrow 
transplants and risk losing their sight because 
their eyes stop producing tears. 


FAILED BUT NOT FORGOTTEN 

Another favourite target is the long list of failed 
drugs. Most of them pass phase I trials, but do 
not get past phase II because they don't have the 
same effect in humans that they had in animals. 
“Still, there are not many compounds that have 
some biological activity and are safe in humans, 
so for heaven's sake let’s try to do something 
else with them,” says Gregory Petsko, a 
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EROOM'S LAW 


The efficiency of research and development of new drugs in the United States halves every 
nine years or so. Drug developers sometimes call this Eroom’s law — Moore’s law for 
microprocessors in reverse. Repositioning drugs could help to counter this decline. 
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A SHORTER TIMESCALE 


Because most repositioned drugs have already passed the early 
phases of development and clinical testing, they can potentially win 
approval in less than half the time and at one-quarter of the cost. 


neuroscientist at Weill Cornell Medical College 
in New York City. The problem is that, apart 
from really old ones like ebselen, they tend to 
be locked in the industry’s drawers. 

“Sometimes, companies make official 
announcements when they abandon a mol- 
ecule, but in most cases they don't,” says 
Hermann Mucke, a biochemist who in 2000 
founded the Vienna-based firm HM Pharma 
Consultancy, which now makes a business 
from hunting through discontinued com- 
pounds. “So we monitor a number of sources 
and look for drugs that have quietly disap- 
peared from pipelines, or for clinical trials that 
were announced and never led to a publica- 
tion.” When they feel there may be room for 
repositioning, Mucke and his staff approach 
the owner of the drug and try to strike a deal 
that will allow them to do further tests and 
development — and share in any profits that 
result. They are also creating a database of 
drugs that have been approved but are no 
longer manufactured, and of drugs that have 
been abandoned during development. “We are 
developing it for our own use,’ he says. “But if 
we can find investors, we would like to turn it 
into a public resource.” 

In the absence of such a public resource, 
both the UK Medical Research Council (MRC) 
and NCATS have struck deals with major 
pharmaceutical companies, convincing them 
to pick some abandoned compounds from 
their pipelines and release enough informa- 
tion for academic groups to work out whether 
repositioning might be feasible. “There’s a lot 
of research that could be done but is not hap- 
pening, simply because academic people are 
not aware of what pharmaceutical companies 
are doing,” says Christine Colvis, who heads 


316 | NATURE | VOL 534 | 16 JUNE 2016 


12-16 years, ~$1 billion to $2 billion 


Phase | 
Phase II 


FDA 


approval 
1-2 years 


Phase III 
2 years 
Drug repositioning 


—_—— ~6 years, ~$300 million 


the NCATS drug-repurposing effort. 

Although the MRC programme officially 
aims to help researchers to understand the 
biology of diseases, many of the groups that 
it funds end up doing interesting reposition- 
ing work, too. At the University of Manches- 
ter, UK, for example, physician-scientist 
Jacky Smith is testing a compound that was 
originally developed to treat heartburn to see 
whether it can help people with chronic cough. 

The NCATS programme has drawn criti- 
cism, however. “It’s good that some groups 
have had access to some drugs, but that leaves 
out the vast majority of us,’ says Petsko. “And 
there’s no guarantee that the compounds in 
those lists were really the most interesting 
ones.” NCATS spent $12.7 million on 9 pro- 
jects in 2013, and 8 of those have progressed to 
phase II trials. They include a former psoriasis 
drug that is being tested as a smoking-cessa- 
tion therapy, a failed diabetes pill that is getting 
a second chance as a treatment for alcoholism, 
and a failed cancer drug that is now a poten- 
tial therapy for Alzheimer’s disease. A year 
from now, says Colvis, the first results of those 
studies will be published, and if all goes well, 
at least some of them will progress further. In 
the meantime, NCATS invested $2 million last 
year in another round of projects. 


TURNING THE TABLES 

In the long run, says Munos, drug repositioning 
could disrupt big pharma’s business model in 
much the same way that digital music upended 
big record companies in the 1990s. “When cur- 
rent efforts start resulting in a flow of market 
approval,’ he says, “and we see many small 
companies developing drugs for a few mil- 
lions of dollars, there will be a lot of interesting 
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competition with traditional companies.” 

That optimism is not universal, however. 
“Not all repositioning projects that work on 
paper are really feasible,” says Tudor Oprea, a 
bioinformatics researcher at the University of 
New Mexico in Albuquerque who monitors the 
field in addition to doing his own reposition- 
ing work. For instance, he says, side effects that 
would be acceptable for a life-threatening dis- 
ease might not be acceptable for a chronic one. 
And the standard business case for reposition- 
ing — that costs are slashed because safety tests 
are already in the bag — works only if the dose 
and mode of administration remain similar. 
If the new disease requires a significantly higher 
dose, the drug will have to go through phase I 
trials again. In the end, says Oprea, development 
costs can be similar to those for anew molecule. 

LaMattina wonders whether the opportuni- 
ties are really as plentiful as proponents sug- 
gest. When companies test a new molecule, 
he says, they do a wide array of tests on vari- 
ous targets and cell types because they want 
to anticipate the effects. So ifa drug really has 
interesting effects beyond the expected one, 
industry scientists will find out for themselves. 
“Tt’s a bit naive to think that companies over- 
look all these opportunities to do business,” he 
says. “It’s typically people in academia, who 
don't know what happens in the industry, who 
think they can do it” 

But Persidis argues that many companies 
are too specialized to benefit from all the 
repositioning opportunities they have in- 
house. They may have expertise and market 
penetration in neurology, but not in oncology, 
and moving a drug from one field to the other 
could be out of their strategy. “People like us 
keep getting business,” he says, “and that’s 
because larger companies do appreciate having 
an external partner looking at their drugs from 
a different angle.” 

In the end, says Atul Butte, a bioinformati- 
cian at the University of California, San Fran- 
cisco, drug repositioning is a complement to 
the discovery of new molecules, rather than 
an alternative. “We just need more of both,” 
he says. “In modern medicine, we're becoming 
better at figuring out that each disease is actu- 
ally five or ten different ones. There are simply 
not enough companies out there to develop 
new drugs to treat them all” = 


Nicola Nosengo is a freelance writer in Rome. 


1. Singh, N. et a/. Nature Commun. http://dx.doi. 
org/10.1038/ncomms2320 (2013). 

2. Singh, N. et al. Neuropsychopharmacology 41, 
1768-1778 (2016). 

3. Scannell, J. W., Blanckley, A., Boldon, H. & 
Warrington, B. Nature Rev. Drug Discov. 11, 
191-200 (2012). 

4. American Chemical Society. International Year of 
Chemistry 2011: Activities Report of the American 
Chemical Society (ACS, 2011); available at 
go.nature.com/1tbzmxn. 

5. Lekka, E., Deftereos, S. N., Persidis, A., Persidis, A. 
& Andronis, C. Drug Discov. Today Ther. Strateg. 8, 
103-108 (2011). 


SOURCES: J. W. SCANNELL ET. AL. NATURE REV. DRUG DISCOV. 


11, 191-200 (2012); CYTHERA PHARMACEUTICALS 


MICHAEL D. KOCK 


POLICY Rubric for prioritizing 5 A fond history of the ‘ILi Biomechanics adviser : Acall 
action on the Sustainability Cavendish, a lab with few to Finding Doryin ~ to shun predatory 
Development Goals p.320 rivals p.323 conversation p.325 journals p.326 


Women from a traditional sea-harvesting community fishing in Mozambique. 


Fallin fish catch threatens 
human health 


Christopher Golden and colleagues calculate that declining numbers of marine 
fish will spell more malnutrition in many developing nations. 


ow will the 10 billion people 
H expected to be living on Earth by 
2050 obtain sufficient and nutri- 
tious food? This is one of the greatest chal- 
lenges humanity faces. Global food systems 
must supply enough calories and protein 
for a growing human population and pro- 
vide important micronutrients such as iron, 
zinc, omega-3 fatty acids and vitamins. 
Deficiencies of micronutrients — so 
called because the body needs them only 
in tiny amounts — can increase the risks of 
perinatal and maternal mortality, growth 
retardation, child mortality, cognitive def- 
icits and reduced immune function’. The 
associated burdens of disease are large. 
Forty-five per cent of mortality in children 


under five is attributable to undernutrition; 
nutritional deficiencies are responsible for 
50% of years lived with disability in chil- 
dren aged four and under’. 

Fish are crucial sources of micronutri- 
ents, often in highly bioavailable forms. 
And fish populations are declining. Most 
previous analyses have considered only how 
people will be affected by the loss of protein 
derived from fish. We calculate that this is 
the tip of the iceberg. Combining data on 
dietary nutrition, and fish catch, we predict 
that more than 10% of the global popula- 
tion could face micronutrient and fatty-acid 
deficiencies driven by fish declines over the 
coming decades, especially in the develop- 
ing nations at the Equator (see “Troubled 
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Waters. This new view underlines the need 
for nutrition-sensitive fisheries policies. 


NUTRITIONAL RISK 
Presently, 17% of the global population is zinc 
deficient, with some subpopulations being 
particularly at risk’. Nearly one-fifth of preg- 
nant women worldwide have iron-deficiency 
anaemia and one-third are vitamin-A defi- 
cient'. We estimate that 845 million people 
(11% of the current global population) are 
poised to become deficient in one of these 
three micronutrients if current trajectories 
in fish-catch declines continue. 
Considering nutrients found only in 
foods derived from animals, such as vita- 
min B,,, and DHA omega-3 fatty acids 
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> (almost exclusively derived from meat 
consumption, see Supplementary Informa- 
tion; go.nature.com/25oll0p), we calculate 
that 1.39 billion people worldwide (19% 
of the global population) are vulnerable to 
deficiencies because fish make up more than 
20% of their intake of these foods by weight. 


IMPACT ASSESSMENT 

To make this sobering new assessment, we 
coupled two databases from 2010, the most 
recent year for which both had data. The new 
Global Expanded Nutrient Supply (GENuS) 
database combines food balance sheets (total 
quantity of food production and imports 
minus livestock feed, post-harvest losses, 
and exports) and production or trade data 
from the Food and Agriculture Organization 
of the United Nations (FAO) with estimates 
of food group intake by age and sex’. It esti- 
mates per capita edible supplies for 225 foods, 
paired with regional food composition tables 
to infer nutrient supplies by country. GENuS 
is supported by the Bill and Melinda Gates 
Foundation and the Winslow Foundation. 

We categorized populations as nutrition- 
ally vulnerable if their nutrient supply was less 
than double the estimated average require- 
ment (EAR), and if they derived from fish 
more than 10% of their vitamin A or zinc, 
or more than 5% of their iron. We chose to 
double the EAR for two reasons. First, even 
in countries where the national average intake 
is greater than the EAR, large variability of 
intake may still mean that a significant part 
of the population is eating less well. If we had 
used the EAR as our threshold, more than 
50% of a nation’s population would need to 
be deemed at risk of nutritional deficiency, 
which we feel is an irresponsibly high pro- 
portion required for raising an alarm. Second, 
our GENuS-derived estimates measure food 
supply rather than food intake, and are gener- 
ally regarded as overestimates of true intake’. 

The Sea Around Us database, released in 
2016, provides a portrait of marine fisheries 
catch between 1950 and 2010 for every coastal 
nation®. Over 15 years, a team of researchers 
in every coastal country collated informa- 
tion from government documents, academic 
research and maritime records to reconstruct 
the numbers of fish caught. This database 
measures the contribution of subsistence, 
artisanal and industrial marine fisheries to 
food supply at the country level more accu- 
rately than previous estimates’. This database 
was funded by the Pew Charitable Trust and 
the Paul G. Allen Family Foundation and is 
maintained by staff at the University of British 
Columbia in Vancouver, Canada. 

These global marine catch data are 
alarming. Conservative estimates by the FAO 
characterize global fisheries as stable, but 
acknowledge that global catch has declined by 
0.38 million tonnes per year since 1996 (ref. 
4). The Sea Around Us estimates summarized 
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earlier this year’ paint a much bleaker picture, 
in which fish catch peaked in 1996 and has 
been falling by 1.22 million tonnes (roughly 
1%) per year since — three times faster than 
the decline reported by FAO. The degrada- 
tion of marine habitat by destructive fishing 
practices, industrial pollution, climate change 
and coastal development for urbanization and 
aquaculture is likely to further degrade ocean 
ecosystems and reduce fisheries yields. Such 
patterns call into question the ability of wild 
fisheries to support future demand for fish. 


PERFECT STORM 

The health impacts of fishery declines will 
hit some places harder than others. A perfect 
storm is brewing in the low-latitude develop- 
ing nations. This is where human nutrition is 
most dependent on wild fish, and where fish- 
eries are most at risk from illegal fishing, weak 
governance, poor knowledge of stock status, 
population pressures and climate change. 

Sharp declines in the health of fisheries were 
first described in the twentieth century in 
high-latitude regions where industrial fishing 
began, such as the northwest Atlantic Ocean. 
Developed countries have compensated with 
intensive agricultural production, by import- 
ing goods (including fish), vitamins, supple- 
ments and fortified foods. Since the 1990s, 
the major declines in fish stocks have been in 
lower latitudes and developing nations. This 
rapid degradation probably results from the 
increasing industrialization of fisheries, poor 
governance and the accelerated expansion of 
foreign fishing in these regions. 

These sensitivities in the tropics will be only 
exacerbated by climate change. Ocean warm- 
ing and shifts in net primary production are 
likely to drive remaining fish and shellfish 
species from low to high latitudes, potentially 
reducing catch globally by more than 6% and 
by as much as 30% in 
some regions (such as “Poor people 
the tropics) by 2050 have fewer 
relative to recentdec- alternatives.” 
ades’. Fish will also 
probably get smaller in the tropics: ocean 
warming and associated declines in oxygen 
content are projected to reduce the average 
biomass of fish communities by around 20% 
during this period’. Coral reefs, essential 
ecosystems for many tropical coastal subsist- 
ence and artisanal fisheries, will be heavily 
degraded by warming and ocean acidifica- 
tion. Mangroves — nurseries for many fish 
that are crucial in developing nations — con- 
tinue to decline. Also under threat are global 
inland freshwater fisheries, another crucial 
source of nutrition and livelihood for hun- 
dreds of millions of people around the world’. 

In these same regions, fish play an impor- 
tant part in the avoidance of diseases associ- 
ated with malnutrition. Nearly all countries 
that depend heavily on fish for nutrition are 
situated in the developing world (46 of 49); we 
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defined these as nations in which more than 
20% of the population's animal-based food 
by weight is seafood. Furthermore, countries 
with the highest levels of undernourishment 
and the weakest governance are often net 
exporters of seafood to well-nourished coun- 
tries with strong governance’. 

Poor people have fewer alternatives to 
make up for these impending shortfalls in 
access to micronutrients. Meat, eggs, vitamin 
supplements and imported fish can be pro- 
hibitively expensive. Communities are often 
forced to rely on what they can harvest locally 
or on less-healthy processed foods. 


FARMED FISH 

Could global increases in fish farming meet 
the nutritional shortfalls we predict for poor 
equatorial populations? With today’s produc- 
tion and distribution patterns, we think not. 
Aquaculture has expanded globally by more 
than an order of magnitude over the past 
three decades’. Farmed fish exceeded wild 
catch destined for human consumption for 
the first time in 2014(ref. 10). However, aqua- 
culture has not yet developed significantly in 
many low-income countries where food and 
technology is in short supply (particularly 
sub-Saharan Africa and the Pacific Islands"). 
Such regions still largely depend on domes- 
tic, subsistence and artisanal fishing. These 
small-scale fisheries include the pirogue fleets 
of West Africa, the spearfishers of the world’s 
coral reefs, and shore-based gathering of 
shellfish in mangrove creeks. 

Where aquaculture is growing, much of 
it has been aimed at wealthier consumers 
in domestic cities or in international mar- 
kets, rather than local rural areas’. Globally, 
developing countries export higher value fish 
(caught and farmed) and import lower-value 
food from industrial fisheries in developed 
countries’’. For example, shrimp, tilapia and 
Mekong catfish grown in developing and 
transitional countries such as Bangladesh, 
China, Indonesia, Ecuador, Thailand and 
Vietnam are mostly exported to the wealthy 
countries of Europe and North America, or 
consumed by the growing middle-classes in 
the megacities of these economies. The bene- 
fits of these export-oriented industries to live- 
lihoods and nutrition of the coastal poor are 
unclear and cannot be inferred from national 
seafood balance data or national economic 
growth statistics'*. Moreover, commercial 
aquaculture can displace coastal and inland 
fisheries, and small-scale aquaculture produc- 
ers may not always profit from engagement in 
global value chains. 

Farmed fish may also be of lower nutri- 
tional value’. Aquaculture species that are 
most affordable, such as carp, are often not 
as rich in omega-3 fatty acids as are the wild 
species currently accessible to impoverished 
communities, such as sardines and macker- 
els. Oil-rich wild fish are also the basis for 


In the low-latitude developing nations, human nutrition is most dependent on wild fish, and fisheries are most 
at risk from illegal fishing, weak governance, poor knowledge of stock status, population pressures and climate 
change. These countries urgently need effective strategies for marine conservation and fisheries management 
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In developing small island states of 
the Pacific, wild fisheries will move 
poleward because of a rise in sea 
temperature, and aquaculture in 
deltas and floodplains will be 
affected by rising sea levels. 
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aquaculture feeds. Because supplies of these 
wild-fish have little potential for expan- 
sion under current management policies, 
plant-based feeds are increasingly used, 
further altering the nutritional composition 
of farmed fish. And aquaculture generally 
focuses on fewer species than those caught 
from the wild. A global fish supply dominated 
by aquaculture, as it is currently practiced, 
would lead to a drop in the diversity, and thus 
nutritional quality, of many diets”. 

Yet, when explicitly planned to improve 
local well-being, aquaculture can bea crucial 
contribution to local diets and economies. 
For example, in Bangladesh, smallholder 
integrated systems in which fish are raised 
in rice paddies have improved local food 
security”. Small indigenous fish, rich in 
nutrients, can be grown for household con- 
sumption in ponds together with carp, tila- 
pia or catfish as cash crops. Less-intensive 
and more-diverse forms of aquaculture may 
have the most potential to meet the nutrition 
and food-security needs of the poor. 

This promise may, however, be constrained 
by lack of suitable sites. Both inland and 
coastal production are under increasing pres- 
sure from urbanization and industrialization. 
In the same small island states likely to be hit 
hardest by shifts in the distribution of marine 
species’, prospects for increasing aquaculture 
are at best mixed. Much aquaculture pro- 
duction is concentrated in river deltas and 


floodplains, brackish-water lagoons and other 
low-lying tropical coastal areas. These areas 
will be heavily affected by sea-level rise, ocean 
acidification and increased storm intensity. 
Non-intensive aquaculture technologies face 
many challenges and are currently minor con- 
tributors to global production. 


NEXT STEPS 

Next-generation models that integrate data 
on human health and fisheries, such as those 
explored here, with climate and population 
models need to play an important part in 
estimating the human health burdens of envi- 
ronmental change and the enormous public 
health dividends of natural-resource manage- 
ment. These models can also be used to iden- 
tify hotspot countries that urgently need more 
effective strategies for marine conservation 
and fisheries management to rebuild stocks 
for nutritional security’. 

Aquaculture also needs reform so that 
undernourished people can access nutritious 
products. The following can help: boosting 
investment in less intensive and domestically 
oriented aquaculture of cheap and nutritious 
species; farming species lower in the food 
chain to reduce dependence on wild-caught 
fish meal; and allocating coastal land and 
water resource rights to small-scale aquacul- 
ture (as has already been done for fisheries). 

The analytical methods currently in use 
to inform fisheries and aquaculture policies 
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require refinement. Data on food-price fluc- 
tuations are needed for local economic mod- 
els of fish supply and dietary substitution, 
and empirical research is required to under- 
stand and model how populations around 
the world will adapt to changes in fish sup- 
ply and market prices. We need better data 
on freshwater fisheries and aquaculture, as 
well as on the nutrient composition of foods 
and nutritional status of more human pop- 
ulations around the world. Improvements 
should include separating data on farmed 
and wild fish to better characterize vulner- 
ability to micronutrient deficiencies. 
Addressing these emerging problems 
will require new interdisciplinary partner- 
ships among fisheries scientists, aquacul- 
ture technologists, ecosystem managers, 
nutrition and public-health specialists, 
development economists, granting agen- 
cies and policymakers. As a first step, new 
funding streams are required to support 
the emerging discipline of planetary health, 
dedicated to characterizing and quantifying 
the human health impacts of accelerating 
global environmental change. For example, 
our work has been supported by the Well- 
come Trust, the US National Socio-Environ- 
mental Synthesis Center and the Rockefeller 
Foundation. A second step would be more 
interaction between health agencies (such as 
the World Health Organization, the United 
Nations children’s fund UNICEF, and health 
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ministries) and ocean-management 
agencies (such as the FAO, the UN Envi- 
ronmental Programme, regional fisheries 
management organizations, and minis- 
tries of fisheries and the environment). 

Mitigating losses of biodiversity and 
income have been at the heart of fisheries- 
management policies. In our view, there 
should be a much stronger emphasis on 
human health. This would mirror recent 
shifts in agricultural policy that respond 
to rising burdens of diet-related diseases. 

These policy changes are possible. We 
believe that improvements in fisheries 
management and marine conservation can 
serve as nutritional delivery mechanisms. 
A meta-analysis of nearly 5,000 fisheries 
worldwide found that applying sound 
management reforms to global fisheries 
could increase catch by more than 10%”. 
Without these changes, the health of the 
poor is at risk. m 
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Solar lights are used by vendors in rural western India, where lack of electricity has stymied development. 


Map the interactions 
between Sustainable 
Development Goals 


Mans Nilsson, Dave Griggs and Martin Visbeck present 
a simple way of rating relationships between the targets 
to highlight priorities for integrated policy. 


ext month in New York, the United 

Nations’ 2030 Agenda on Sustain- 

able Development will have its 
first global progress review. Adopted by the 
UN General Assembly in 2015, the agenda 
represents a new coherent way of think- 
ing about how issues as diverse as poverty, 
education and climate change fit together; 
it entwines economic, social and environ- 
mental targets in 17 Sustainable Develop- 
ment Goals (SDGs) as an ‘indivisible whole. 
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Implicit in the SDG logic is that the goals 
depend on each other — but no one has spec- 
ified exactly how. International negotiations 
gloss over tricky trade-offs. Still, balancing 
interests and priorities is what policymak- 
ers do — and the need will surface when the 
goals are being implemented. If countries 
ignore the overlaps and simply start try- 
ing to tick off targets one by one, they risk 
perverse outcomes. For example, using coal 
to improve energy access (goal 7) in Asian 


AMIT DAVE/REUTERS 


nations, say, would accelerate climate change 
and acidify the oceans (undermining goals 
13 and 14), as well as exacerbating other 
problems such as damage to health from air 
pollution (disrupting goal 3). 

If mutually reinforcing actions are taken 
and trade-offs minimized, the agenda will be 
able to deliver on its potential. For example, 
educational efforts for girls (goal 4) in south- 
ern Africa would enhance maternal health 
outcomes (part of goal 3), and contribute to 
poverty eradication (goal 1), gender equality 
(goal 5) and economic growth (goal 8) locally. 

The importance of such interactions is built 
into the SDGs: ‘policy coherence’ is one of the 
targets. The problem is that policymakers 
and planners operate in silos. Different min- 
istries handle energy, agriculture and health. 
Policymakers also lack tools to identify which 
interactions are the most important to tackle, 
and evidence to show how particular inter- 
ventions and policies help or hinder progress 
towards the goals. Many preconceptions that 
influence decisions are outdated or wrong, 
such as the belief that rising inequalities 
are necessary for economic growth, or that 
mitigating climate change is bad for 
productivity growth in the long term’. 

To make coherent policies and strategies, 
policymakers need a rubric for thinking 
systematically about the many interactions — 
beyond simply synergies and trade-offs — in 
order to quickly identify which groups could 
become their allies and which ones they will 
be negotiating with. And they need up-to- 
date empirical knowledge on how the goals 
and interventions of one sector affect another 
positively or negatively. 

Asa first step, we propose a seven-point 
scale of SDG interactions (see ‘Goals scoring’) 
to organize evidence and support decision- 
making about national priorities. This should 
help policymakers and researchers to iden- 
tify and test development pathways that 
minimize negative interactions and enhance 
positive ones. And it is globally applicable so 
that countries can compare and contrast, and 
learn from each other and over time. 


SEVEN INTERACTION TYPES 

We rate seven possible types of interactions, 
from the most positive (scoring +3) to the 
most negative (-3). These can be applied at 
any level — among goals and targets, to indi- 
vidual policies or to actions (see “The wins 
and losses en route to zero hunger’). 

For practical policymaking, the process 
should start from a specific SDG — in line 
with a minister’s mandate — and map out, 
score and qualify interactions in relation to 
the other 16 goals and their targets. 

Positive interactions lend themselves 
to building strategies across sectors. The 
three negative types will be subject to trade- 
offs, and the target of extra regulations and 
policies, such as bans. But negative-scoring 


COMMENT 


The influence of one Sustainable Development Goal or target on another can be summarized with this 


GOALS SCORING 

simple scale. 

Interaction | Name Explanation 

iS Indivisible Inextricably linked to the 
achievement of another goal. 

a Reinforcing Aids the achievement of 
another goal. 

sell Enabling Creates conditions that 
further another goal. 

0 Consistent No significant positive or 
negative interactions. 

-1 Constraining 

-2 Counteracting Clashes with another goal. 

-3 Cancelling Makes it impossible to reach 
another goal. 


interactions might also attract public invest- 
ment in technologies and solutions that over 
time might push the needle up the scale. 
There are four main considerations when 
applying the scale. First, is the interaction 
reversible or not? For example, failing on 
education (goal 4) could irreversibly damage 
social inclusion (goal 8). Loss of species owing 
to lack of action on climate change (goal 13) is 
another irreversible 
interaction. Con- “Thereisno 
versely, converting formal platform 
land use fromagri- for sharing 


culture tobioenergy knowledge 
production (goal7) related tothe 
might counteract goals. id 


food security (goal 
2) and poverty reduction (goal 1) but could 
be reversed. 

Second, does the interaction go in both 
directions? For instance, providing energy 
to people’s homes benefits education, but 
improving education does not directly 
provide energy. 

A third consideration is the strength of the 
interaction: does an action on one goal have 
a large or small impact on another? Negative 
interactions can be tolerable if they are weak, 
such as the constraints that land resources 
might put on the development of transport 
infrastructure. 

Fourth, how certain or uncertain is the 
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Example 


Ending all forms of discrimination 
against women and girls is indivisible 
from ensuring women’s full and 
effective participation and equal 
opportunities for leadership. 


Providing access to electricity 
reinforces water-pumping and 
irrigation systems. Strengthening the 
capacity to adapt to climate-related 
hazards reduces losses caused by 
disasters. 


Providing electricity access in rural 
homes enables education, because it 
makes it possible to do homework at 
night with electric lighting. 


Ensuring education for all does not 
interact significantly with infrastructure 
development or conservation of ocean 
ecosystems. 


Limits options on another goal. | Improved water efficiency can 


constrain agricultural irrigation. 
Reducing climate change can constrain 
the options for energy access. 


Boosting consumption for growth can 
counteract waste reduction and climate 
mitigation. 


Fully ensuring public transparency and 
democratic accountability cannot be 
combined with national-security goals. 
Full protection of natural reserves 
excludes public access for recreation. 


interaction: is there evidence that it will 
definitely happen or is it only possible? 


CONTEXT MATTERS 

Countries must interpret the SDGs according 
to their national circumstances and levels 
of development, so interaction scores will 
vary. Differences in geography, governance 
and technology make it dangerous to rely on 
generalized knowledge. 

The regional resource base makes a big 
difference. For instance, bioenergy produc- 
tion is widely assumed to counteract food 
security through land competition. But in 
the Nordic region, bioenergy markets have re 
inforced the agricultural and forest pro- 
duction systems — offering new and 
more diversified market opportunities 
and increasing farmers’ and forest own- 
ers’ resilience’. Introducing technologies 
can render interactions more positive. 
For example, a transition to electric cars, 
fuelled by low-carbon power, could make 
personal-car-based mobility more 
consistent with climate-change goals. 

Negative interactions may be the result 
of weakness in institutions, legal rights or 
governance procedures, which marginal- 
ize vulnerable groups. For example, poorly 
governed industrialization and infrastruc- 
ture development (goal 9) in emerging 
economies or agricultural productivity 
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efforts (goal 2) can counteract local liveli- 
hoods and increase inequalities (working 
against goal 10). 

Timescale matters: intensifying food 
production to end hunger in places where 
resources are scarce may be feasible in the 
short term, but over time can deplete fisheries 
and forests. And spatial scale matters, too: for 
instance, industrial development may cause 
pollution and adversely affect the local envi- 
ronment and peoples health, but may also 
generate wealth that can support national 
health infrastructure. Politicians might man- 
date that health plans directly benefit the local 
community. 

This conceptual framework is a start- 
ing point for building an evidence base to 


characterize the goal interactions in specific 
local, national or regional contexts. There is 
no formal platform for sharing such knowl- 
edge yet, but the International Council 
for Science (ICSU) is beginning to use the 
framework and populate it with empirical 
evidence’. The ICSU is bringing together 
research teams of leading experts from uni- 
versities and institutes around the world 
to develop thematic case studies, starting 
with the SDGs for health, energy and food. 
Each team will define the expertise needed 
to characterize and quantify the domain's 
interactions with all other SDGs, organize 
existing knowledge about these interactions, 
and identify key gaps and priorities. 

Many knowledge gaps will surface. For 


A hydropowered irrigation pump in use at the Kabwadu Women’s Banana Farm in Zambia. 


WORKED EXAMPLE 


The wins and losses en route to zero hunger 


In sub-Saharan Africa, ending hunger 
(goal 2) interacts positively with several 
other goals — including poverty eradication 
(goal 1), health promotion (goal 3) and 
achieving quality education for all (goal 4). 
Addressing chronic malnourishment is 
‘indivisible’ from addressing poverty — 
which gains the interaction a score of 

+3. Tackling malnourishment reinforces 
(+2) educational efforts because children 
can concentrate and perform better in 
school. Not addressing food security 
would counteract (-2) education, when the 
poorest children have to help provide food 
for the day. 

Food production interacts with climate- 
change mitigation (goal 13) in several ways, 
because agriculture represents 20-35% 
of total anthropogenic greenhouse-gas 
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emissions‘. Climate mitigation constrains 
(-1) some types of food production, in 
particular those related to meat (methane 
release from livestock constitutes nearly 
40% of the global agricultural sector’s 
total emissions)’. Yet food production 

is reinforced (+2) by a stable climate. 
Securing food from fisheries is also 
reinforced by protecting the climate, 
because that limits ocean warming and 
acidification. 

Finally, in some parts of sub-Saharan 
Africa, promoting food production can also 
constrain (-1) renewable-energy production 
(goal 7) and terrestrial ecosystem 
protection (goal 15) by competing for 
water and land. Conversely, limited land 
availability constrains (-1) agricultural 
production. 
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example, the relationship between urban 
developments and human health and well- 
being is only beginning to be studied. Fill- 
ing the gaps will be costly and will require 
contributions from research councils and 
funders such as the European Union's Hori- 
zon 2020 framework, as well as governments 
and universities. The UN should consider 
how best to track interactions in its SDG 
monitoring systems, which is now being 
designed. Tracking interactions will be more 
complicated than monitoring single sectors, 
but it could be done in detail in a few key 
places, such as for the nine SDG pilot coun- 
tries, which include Uganda and Vietnam. 
This interactions framework is intuitive, 
relatively easy to use and broadly replicable. 
It will facilitate the accumulation of knowl- 
edge and policy learning across countries. 
To further ensure that the research meets 
governments’ needs, the ICSU and other 
knowledge brokers such as the Organisation 
for Economic Co-operation and Develop- 
ment and the UN should convene a series 
of dialogues and workshops around inter- 
actions and how to apply them to policy- 
making. A first opportunity to put SDG 
interactions on the agenda is at next month’s 
high-level political forum, where 22 coun- 
tries, including Germany and Colombia, will 
report back on their early action plans. m 
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CORRECTION 

Reference 1 in the Comment ‘Create a 
global microbiome effort’ (N. Dubilier 
et al. Nature 526, 631-634; 2015) gave 
incorrect page numbers. It should have 
read: Alivisatos, A. P. et al. Science 350, 
507-508 (2015). 
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CAVENDISH LAB./UNIV. CAMBRIDGE 


Ernest Rutherford (right) and Jack Ratcliffe in the Cavendish Laboratory in 1932. 


Crucible of science 


Graham Farmelo ponders Malcolm Longair’s study of 
the Cavendish, a physics laboratory with few rivals. 


innovation. They started to spring up 

150 years ago, centuries after investiga- 

tors began trying to understand the inanimate 
world using observation and rational thought. 
Chemists were ahead of the game — their first 
laboratories appeared a quarter of a millen- 
nium earlier (D. Lowe Nature 521, 422; 2015). 
In 1927, Ernest Rutherford spoke at the 
opening of the H. H. Wills Physics Labo- 
ratory at the University of Bristol, UK. He 
said: “Our pure science laboratories should 
in the main be set aside for fundamental 


Pp hysics laboratories are a relatively recent 


research.” Those that did research relating 
to industry should be funded by the govern- 
ment or manufacturers, in places “where the 
research workers can come into close contact 
with manufacturing conditions” Rutherford 
was at the time direc- 


tor of the Cavendish Maxwell’s 
Laboratoryatthe Uni- Enduring Legacy: 
versity of Cambridge, A Scientific History 


UK — a world-leading af the Cavendish 


ane © Laboratory 
institution for experi- MALCOLM LONGAIR 
mental physics. Today, — Cambridge University 
his purism looks Press: 2016. 


BOOKS & ARTS 


almost quaint, with most academic physics 
laboratories relying heavily on support from 
industry and other sources. 

In what is patently a labour of love, the 
astronomer Malcolm Longair now gives 
us a comprehensive scientific history of the 
Cavendish in Maxwell’s Enduring Legacy. 
Longair, who was the lab’s head from 1997 
to 2005, describes its inception well. Its early 
development in the 1870s, ona small site near 
the centre of town, was bankrolled by the uni- 
versity’s chancellor, William Cavendish. He 
was among those who wanted to ensure that 
Cambridge could continue to supply top- 
notch graduates to help to administer the 
increasingly technological British Empire. 
Several sceptics, particularly among the 
champions of the prestigious natural-sciences 
course, the Tripos, argued that experimental 
training was unnecessary. The mathematician 
Isaac Todhunter wrote: “Experimentation is 
unnecessary for the student.” He believed that 
“the student should be prepared to accept 
whatever the master told hin’ 

The venture made an excellent start, 
Longair shows, when James Clerk Maxwell 
was appointed its first director. A strong 
mathematician with almost superhuman 
physical intuition, he was determined to 
nurture experimentation, and had extra- 
ordinarily wide interests. He was as eager to 
explore the new technology of wireless tele- 
graphy as he was to master modern topologi- 
cal mathematics. After his death in 1879 at 
just 48, the university appointed as his succes- 
sor John William Strutt, Lord Rayleigh, a ver- 
satile physicist who went on to discover argon 
and win a Nobel prize. Strutt’s five years in the 
post consolidated the reputation of the lab. 
He was succeeded by theorist J. J. Thomson, 
who, although not a dexterous experimenter, 
discovered the electron at his bench there in 
1897. With Thomson at the helm, the Cav- 
endish rivalled the mighty Imperial Physical 
Technical Institute in Berlin as the world’s 
pre-eminent centre for experimental physics. 

The next director, Rutherford, died unex- 
pectedly in 1937 after a botched operation. In 
the quest to find his successor, Longair says, 
the crystallographer Lawrence Bragg was the 
“obvious choice”. This surprised me. In 1992, 
theoretical physicist Rudolf Peierls, who knew 
the lab well, told me that Rutherford’s deputy 
James Chadwick (discoverer of the neutron) 
was the widely tipped successor, and that the 
failure to appoint him led to “a minor scandal”. 
Either way, Bragg proved a far-sighted leader, 
and his promotion of crystallography yielded 
impressive results. Notably, Francis Crick and 
James Watson gave the Cavendish one of its 
greatest triumphs by identifying the structure 
of DNA in 1953. 

The quantum theorist Nevill Mott, 
appointed director in 1954, continued the 
policy of diversification and expansion. 
Teaching and research activity doubled, 
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> and the lab’s Radio Astronomy Group, 
led by Martin Ryle, had a series of suc- 
cesses, most famously the discovery of 
pulsars by Jocelyn Bell Burnell in 1967, 
working with her supervisor Antony 
Hewish (A. Hewish et al. Nature 217, 
709-713; 1968). By this time, the 
Cavendish was so large that its director 
was not so much a powerful commander- 
in-chiefas chair of a company, as Longair 
aptly describes it. 

The lab’s research had outgrown its 
space: the number working there had 
risen from around a dozen in the 1870s 
to some 40 times that number. In 1973, 
the next director, Brian Pippard, moved 
the Cavendish to much larger premises in 
West Cambridge, the workplace of about 
1,000 people. Longair chronicles this move 
and presents the achievements of Pippard 
and his successors as Cavendish Profes- 
sor of Physics, Sam Edwards and Richard 
Friend, with detail that will satisfy the 
most assiduous reader of annual reports. 
The breadth 


and depth of the “Rutherford 
areas of phys- kept an eye 

ics now being 0! almost 
explored by every research 


the laboratory project.” 

are remark- 

able: all its previous specialities, as well 
as everything from optoelectronics to 
medical physics, thin-film magnetism 
and the physics that underlies studies of 
the sustainability of the global economy. 

Longair’s history is in the form of a 
well-organized modern physics book, 
most of its 22 sections replete with charts, 
tables and lucid technical explanations 
presented neatly in boxes. Abundant 
diagrams, photographs, line drawings, 
floor-plans and facsimiles of historical 
documents give fascinating insights into 
the lab’s development. Very much the 
account of an insider, the book would 
have benefited from a wider international 
perspective. 

It would also have been interesting 
to hear more about the challenges that 
the lab faces to preserve its eminence. 
Rutherford kept an eye on almost every 
research project — no longer feasible for 
even the most energetic director — and 
took personal responsibility for keeping 
his fiefdom fleet of foot so that it could 
respond quickly to developments. The 
main challenge of directing the labora- 
tory today, I imagine, is to ensure that the 
elephant can keep dancing. m 
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MEDICAL RESEARCH 


Citizen medicine 


Sally Frampton and Sally Shuttleworth explore a show 
on public involvement in the evolution of vaccination. 


he introduction of vaccination in 
| the late eighteenth century is often 
viewed as a defining moment, when 
modern medicine began to stem the ravages 
of disease. But it has not just been down to 
pioneering doctors: members of the public 
have been significant in shaping the devel- 
opment of vaccination, both as practitioners 
and as critics. Vaccination: Medicine and the 
Masses, an exhibition at the Royal College of 
Surgeons’ Hunterian Museum in London, 
seeks to unravel those threads with photo- 
graphs, letters, pamphlets, specimens and 
medical devices. 

The exhibition is part of Constructing 
Scientific Communities (http://conscicom. 
org), a project on citizen science past and 
present for which we are researcher (S.F.) and 
principal investigator (S.S.). In 1798, physi- 
cian Edward Jenner published An Inquiry 
into the Causes and Effects of the Variole 
Vaccine (a draft manuscript features in the 
exhibition). It showed that protection from 
the deadly, disfiguring disease smallpox could 
be conferred by exposure to the much milder 
cowpox. Jenner’s experiments convinced 
fellow medics, but were themselves inspired 
by the observations of farming communities 
in southwest England that milkmaids (prone 
to cowpox infection, acquired by handling 
the udders of infected animals) hardly ever 
caught smallpox. 

Local knowledge and volunteers remain 
key to successful vaccination programmes. 
The Global Polio Eradication Initiative, for 
example, has involved more than 20 million 
volunteers since it began in 1988, many work- 
ing in dangerous conditions (H. J. Larson and 
I. Ghinai Nature 473, 446-447; 
2011). One story highlighted 
in the exhibition focuses on 
Ali Maow Maalin, the Somali 
cook who was the last person 
to be infected with naturally 
occurring smallpox. After he 
recovered, Maalin campaigned 
for polio eradication. He died of 
malaria in 2013, while carrying 
out polio vaccinations. 

But public resistance has 
also dogged vaccination, as 
the exhibition makes clear. In _@ 
the nineteenth and twentieth yi 


Anineteenth-century ‘shield’, used , 
to protect vaccination sites. 
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Vaccination: centuries, pamphlets 
Medicineandthe from Britain’s National 
Masses Anti- Vaccination 
Hunterian Museum, League and others 
London. a see hattli 

Until 17 September Piayec on Tears that the 
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children’s blood and 
laid them open to a host of diseases. Resist- 
ance grew to the British government's com- 
pulsory smallpox-vaccination programme, 
established in 1853. By 1907, the programme 
was effectively abolished. 

The diseases have changed, but scepticism 
remains dangerous — not least because of 
the lingering impact of Andrew Wakefield’s 
discredited work hinting at a link between 
the measles, mumps and rubella (MMR) jab 
and autism, published almost 20 years ago. 
US and UK outbreaks of measles in recent 
years have had a strong correlation with 
vaccine refusal. 

As highlighted by Constructing Scientific 
Communities, citizen science now benefits 
from digital platforms such as Zooniverse, 
which enable projects that range from 
identifying galaxies to analysing cancer 
cells. That potential makes it timely now 
to look back to when barriers between pro- 
fessional and amateur science had not yet 
been erected. Researchers are looking, for 
instance, at mass involvement in Victorian 
public-health movements such as the drive 
to stop air and water pollution, and at the 
local natural-history groups whose records 
still serve as benchmarks. With Zooniverse, 
we are creating projects drawing on histori- 
cal records of the era: Diagnosis London, 
for instance, will enable people to analyse 
reports of the nineteenth-century Medical 

Officers of Health for London. 
7 Like Vaccination, these projects 
. 2 offer fascinating insights into the 


\ ; lives of people faced with an array 


of public-health challenges, and 
into the medical science that is 
running to keep up with them. = 
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Ban predators from 
the scientific record 


Predatory journals are 
threatening the credibility of 
science. By faking or neglecting 
peer review, they pollute the 
scholarly record with fringe or 
junk science and activist research. 
I suggest that every publishing 
stakeholder could contribute to 
reining in these journals. 

Universities and colleges 
should stop using the quantity of 
published articles as a measure 
of academic performance. 
Researchers and respectable 
journals should not cite articles 
from predatory journals, and 
academic library databases 
should exclude metadata for such 
publications. 

Companies that supply services 
to publishers, including those 
that license journal-management 
software or provide standard 
identifiers, should decline to work 
with predatory publishers. 

Scholarly databases such as 
Scopus and Thomson Reuters 
Web of Science need to raise the 
bar for acceptance, eliminating 
journals and publishers that use 
flawed peer-review practices. 
The US National Center for 
Biotechnology Information 
should do the same for PubMed 
and PubMed Central. 

Finally, advocates of open- 
access publication must stop 
pretending that the author-pays 
model is free of serious, long- 
term structural problems (see 
J. Beall Nature 489, 179; 2012). 
Just because it works well ina 
few cases doesn't mean it always 
works. 

Jeffrey Beall Auraria Library, 
University of Colorado Denver, 
USA. 
jeffrey.beall@ucdenver.edu 


Hail local fieldwork, 
not just global models 


We contend that science’s 
‘publish-or-perish culture, 
which selects for rapid 
publication in high-ranking 
journals, has contributed to the 


demise of field-based studies 
(see K.-D. Dijkstra Nature 533, 
172-174; 2016). 

Top-tier journals tend to 
favour large-scale analyses that 
answer big, general questions 
(see J. M. Fitzsimmons and 
J. H. Skevington Nature 466, 179; 
2010), presumably because they 
help to boost journal impact 
factors. Unlike basic ecological 
and observational studies, such 
analyses seldom involve the 
collection of new, local field data. 
Instead, they depend mainly 
on modelling of published 
information, often over scales 
that would be logistically and 
economically challenging for 
conventional field investigations. 

Because publication in leading 
journals is science’s currency to 
capture funding, funders also 
tend to select against field-based 
research studies — including 
those with undeniable reach 
and importance, such as long- 
term biodiversity monitoring 
(see T. Birkhead Nature 514, 
405; 2014). 

Given the current biodiversity 
crisis, journals and funding 
agencies — as well as the 
scientific community — must act 
to reverse this trend. 

Catarina Ferreira Trent 
University, Peterborough, Canada. 
C. Antonio Rios-Saldafia 
BioCorima, Arteaga, Mexico. 
Miguel Delibes-Mateos Institute 
for Advanced Social Studies 
(IESA-CSIC), Cordoba, Spain. 
catferreira@gmail.com 


Acode of conduct for 
data on epidemics 


As a long-term champion of 
open-access research data on 
pandemic viruses and a member 
of the Italian Parliament, I urge 
Brazil to hasten the reform of its 
current biosecurity legislation. 
This would enable sharing 

of vital Zika virus samples 

and information, as recently 
called for by the World Health 
Organization (see M.-P. Kieny 
et al. Nature 533, 469; 2016, and 
go.nature.com/104x3dp). 
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Data sharing for viruses has 
been disappointingly patchy 
since I first ignited the debate 
by depositing my unpublished 
sequence data for H5N1 avian 
influenza virus in a public 
database, rather than in the 
established password-protected 
system (see Nature 440, 255-256; 
2006). When the 2009 H1N1 
swine flu virus emerged, the 
importance of data sharing was 
evident in the rapid response 
to the pandemic. However, the 
first isolate of the Middle East 
respiratory syndrome (MERS) 
coronavirus from Saudi Arabia 
was controversially submitted for 
patenting in 2013 (see go.nature. 
com/1uu7ldd). And in last year’s 
Ebola virus epidemic, there were 
significant gaps in the availability 
and posting of online sequence 
data (N. L. Yozwiak et al. Nature 
518, 477-479; 2015). 

To overcome such hurdles, 

I suggest that the United Nations 
and relevant stakeholders should 
develop guidelines for scientists, 
institutions and governments. 
These should harmonize 

codes of conduct on sharing 
information about emerging 
biological threats — including 
pathogens that are resistant to 
antimicrobials. 

Ilaria Capua Italian Chamber of 
Deputies, Italy. 

ilariacapual @gmail.com 


Archive computer 
code with raw data 


As the leader of a young research 
group, I recognize the need to 
archive more than just the raw 
data that underpin scientific 
papers. Archiving computer code 
is also important for safeguarding 
scientific integrity and for 
facilitating ongoing projects. 
Most scientific journals 
demand that researchers make 
their primary data publicly 
available in the interest of 
reproducibility. Access to the 
associated computer code 
enables statistical analyses and 
calculations to be validated 
(see Nature 514, 536; 2014). 
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The more explicit the links 
between the data, the code and 
the resulting outputs (including 
tables and figures), the easier it is 
to reproduce the findings. 
Software tools such as knitr 
and R Markdown allow the 
description and code ofa 
statistical analysis to be combined 
into a single document, 
providing a pipeline from the 
raw data to the final results and 
figures. Outputs are updated 
by re-running the scripts using 
version-control tools such as Git 
and GitHub. 
My group has elected 
to use these tools and to 
include R Markdown files as 
supplementary information 
to our publications (see, for 
example, M. A. Stoffel et al. 
Proc. Natl Acad. Sci. USA 112, 
E5005-E5012; 2015). Isuggest 
that journals encourage this 
practice to help to fight the 
reproducibility crisis. 
Joseph I. Hoffman University of 
Bielefeld, Germany. 
joseph. hoffman @uni-bielefeld.de 


Tea but not dinner 
with Karl von Frisch 


In the 1960s, I had reason to 
discuss with my friend the late 
Annemarie Weber, a muscle 
physiologist, the morality of 
ethologist Karl von Frisch’s 
decision to continue his studies 
on honeybee communication 
during the Second World War 
(see M. L. Winston Nature 533, 
32-33; 2016). 

Annemarie’s father, Hans 
Weber, had been removed 
from his post as professor at the 
University of Tiibingen because 
he was an opponent of the Nazis. 
Too famous to be harmed, he was 
instead transferred to a minor 
university in East Prussia. Her 
precise but nuanced response to 
me was: “After the war, my father 
would have tea with von Frisch 
— but dinner, never” 
Michael Katz March of Dimes 
Foundation, White Plains, New 
York, USA. 
mkatz@marchofdimes.org 
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The language of flowers 


The complete DNA sequences of the two wild parents of the garden petunia 
provide valuable genetic insights into this model plant, and will improve the 
optimization of other crop plants for agriculture. 


SANDRA KNAPP & DANI ZAMIR 


r | he domestication of plants is often 
thought to apply mainly to food 
crops, but cultivation has also been 

widely used to enhance the beauty of orna- 
mental plants. Writing in Nature Plants, 
Bombarely et al.’ report the genome sequences 
of two progenitor species of Petunia hybrida, 
a plant domesticated for its flowers. These 
genomes are a notable addition to the known 
sequences of members of the nightshade fam- 
ily (Solanaceae)’. They will enable researchers 
to unravel fundamental mechanisms in evolu- 
tion, ecology and gene function, and will help 
to bring an understanding of the relationships 
between plant genomes closer. 

Petunia is used as a model organism, but one 
might nonetheless wonder why the genome of 
a popular flower is of interest. With global con- 
sumption of floriculture products estimated 
to be worth around US$30 billion per year, 
however, much research is aimed at optimiz- 
ing productivity, flower shape, colour, vase-life 
and fragrance. Previous studies® have identi- 
fied many of the genes that influence Petunia 
flower characteristics, highlighting both the 
evolutionary conservation and diversification 
of function between different ornamental vari- 
eties. Bombarely et al. now provide a powerful 


platform for the ornamental-flower industry 
to translate this information to other species, 
increasing the development of new commer- 
cial varieties and species in this economically 
important research field. 

The flowers of wild petunias come in diverse 
shapes and colours’. Cultivated petunias are 
a hybrid between two wild species — the 
pink-flowered Petunia inflata and the white- 
flowered Petunia axillaris. Bombarely et al. 
sequenced both of these genomes, and gener- 
ated transcriptomic data (which detail all the 
messenger RNA molecules ina cell) from three 
unrelated cultivated P hybrida lines. These 
data provide a superb resource for analysing 
not only the genes that confer particular Petu- 
nia characteristics, but also the genomics of 
hybridity. 

The authors found that most of the genes 
expressed in the cultivated species are from 
P. axillaris — 15,000, compared with only 
600 from P inflata. This is partly attributable 
to the use of the white background colour 
derived from P. axillaris as a playground for 
colour manipulation in the cultivated species. 
An alternative explanation is gene conversion, 
in which one parental set of genes comes to 
predominate. Gene conversion has long been 
thought to be confined to species called poly- 
ploids, in which chromosome doubling has 


occurred during evolution. Bombarely and 
colleagues suggest that gene conversion similar 
to that commonly seen in polyploid Solanaceae 
crops such as tobacco might also occur in 
hybrids such as Petunia that are not polyploid. 

Colour and scent are crucial for attracting 
pollinators**. P axillaris is moth-pollinated 
and produces volatile compounds that give 
its night-blooming flowers their strong scent, 
whereas P. inflata is bee-pollinated and has 
little scent. Bombarely et al. find that gene 
sequence alone cannot explain these differ- 
ences. But the ‘ecosystem’ of the genome is 
complex, consisting of many layers of regu- 
lation that do not alter DNA sequence. The 
authors show that the circadian clock that 
regulates scent production is highly diversi- 
fied in Solanaceae species, perhaps pointing 
toa key role for the biochemical pathways that 
regulate circadian rhythms in driving adapta- 
tion to different environmental niches, and 
thus diversification. 

In terms of colour, the Petunia genomes 
provide a powerful resource for understand- 
ing the genomic basis of the biosynthetic path- 
way for pigments called anthocyanins, and for 
analysing how gene position and duplication 
can contribute to diversification of traits, influ- 
encing speciation’. Both parental species share 
the same core anthocyanin pathway, and, as 
expected, the white-flowered P axillaris has 
lost some peripheral components over the 
course of evolution. But some of the genes 
encoding transcription factors that regulate the 
expression of anthocyanin-pathway compo- 
nents reside in exceptionally dynamic regions 
of the genome. The authors provide evidence 
that large and extremely rapid rearrangements 
in these regions were involved in diversifica- 
tion of the Solanaceae. 


Figure 1 | A diverse family of crops. Crops of the Solanaceae family have been bred to produce diverse agricultural and ornamental products, including fruits 
(tomatoes), flowers (petunias) and tubers (potatoes). Bombarely et al.’ report the genome sequences of two progenitor species of the domesticated Petunia, 
which will help researchers to dissect the genetic basis of crop productivity. 
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Human domestication of many Solanaceae- 
related ancestors has resulted in related mod- 
ern crops that have similar sets of genes but 
highly variable characteristics. For example, 
petunia, aubergine, tomato, pepper, potato 
and tobacco are all derived from members of 
the Solanaceae family, and sweet potato and 
coffee are members of the same clade, a larger 
grouping called the euasterids. Each species 
has been bred to enhance the productivity of 
different organs (Fig. 1). Thanks to Bombarely 
and colleagues’ work, the genomes of this 
unique cluster of closely related crops are all 
now available’, which allows us to investigate 
a major biological question — what are the 
genetic factors that dictate a plant’s balance 
between vegetative, photosynthetic carbon- 
dioxide-fixing organs (known as the source), 
and the reproductive organs consumed by 
humans (the sink) that store chemical energy 
in the form of carbohydrates’? 

Answers to this question will be useful for 
optimizing crop productivity across many 
plant species, because the balance between 
vegetative and reproductive development 
determines how much chemical energy will 
be converted to agricultural yield. The fact 
that different organs constitute sinks in sola- 
naceous crops and their relatives could enable 
us to identify genes that regulate the source- 
sink balance beyond those that control specific 
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sink-organ traits. Such knowledge will allow 
identification of evolutionary or breeder- 
selected sink-source innovations in one 
species that could then be deployed in other 
crops by using the plant breeder’s rapidly 
expanding molecular toolbox’. 

The rich diversity of petunias, combined 
with our new understanding of the genes that 
regulate this ornamental beauty, will facilitate 
a deeper understanding of the genetic language 
that regulates the glory of flowers. A bigger 
challenge, however, is to understand naturally 
occurring variability and diversification, and 
to use this knowledge to better conserve the 
biodiversity on which our future depends. 

If we are to use genomics to unpick evolu- 
tionary relationships between solanaceous and 
other species, each genome must be considered 
in the context of the organism's characteristics 
and the selection pressures that it faces in the 
wild. A major barrier to linking genomes and 
traits is a lack of consensus on how to anno- 
tate qualitative and quantitative traits in a 
computable manner’. A step towards this goal 
is the database developed by the Solanaceae 
Genome Network (https://solgenomics.net), 
which presents an ontology to describe traits 
from different plant species in a common 
framework”. 

The next step is to link genomes and traits in 
a bioinformatics framework that can associate 
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specific DNA sequences with traits that arise 
at different stages of organismal development 
and in different environments. Finally, perhaps 
the greatest challenge in linking the genome to 
the traits that it encodes is social — persuading 
the scientific community to deposit its data in 
open-source platforms so that others can use 
them. = 
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IMMUNOTHERAPY 


Cancer vaccine triggers 
antiviral-type defences 


An immunotherapy approach targets nanoparticles to dendritic cells of the 
immune system, leading to an antitumour immune response with antiviral-like 
features. Initial clinical tests of this approach show promise. 


JOLANDA DE VRIES & CARL FIGDOR 


effective form of immunotherapy. But in 

a paper online in Nature, Kranz et al.’ 
describe a vaccination strategy against cancer 
that targets existing tumours by recruiting 
immune mechanisms normally used against 
viral infection. The authors used nanoparticles 
carrying tumour RNA to simulate the intru- 
sion of a viral pathogen into the bloodstream. 
When the nanoparticles reach lymphoid 
tissues, including the spleen and lymph nodes, 
they activate antiviral defence mechanisms in 
immune cells such as dendritic cells. The den- 
dritic cells translate RNA obtained from the 
nanoparticles to express and present tumour 
antigens (molecules used by the immune 
response as attack targets) to the T cells of the 
immune system, priming these cells to launch 
an antitumour immune response. 

Why is it so difficult to effectively vaccinate 
against cancer? One reason is that cancer cells 
are similar in many ways to normal cells and 
the immune system avoids attacking the self. 


Pp reventive vaccines are perhaps the most 
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Figure 1 | An antitumour nanoparticle vaccine. a, Kranz et al.' prepared 
nanoparticles (lipid complexes containing RNA that encodes tumour 
antigens), and report that they target dendritic cells and macrophages in mice. 
Nanoparticle uptake by precursor dendritic cells causes them to develop 

into mature antigen-presenting dendritic cells that migrate to the T cells. 
Uptake of nanoparticles by plasmacytoid dendritic cells promotes secretion 


[===> 


First interferon 


@ @ _@ 


Only relatively modest immune responses 
occur with vaccines containing antigens that 
are also expressed on healthy tissue. Strong 
immune responses can be expected only when 
cancer cells express antigens that are not usu- 
ally expressed in normal adult cells. 

Another reason is that the growth of a 
cancer is not accompanied by strong inflam- 
matory signals such as those that occur dur- 
ing microbial infection and which initiate a 
strong immune response. This leads to tumour 
microenvironments in which immune cells 
tolerate, or even promote, cancer growth’. 
Antitumour vaccines must therefore work 
when the disease has already taken hold, and 
often when it has spread throughout the body. 
Last, and in a key contrast to preventive vac- 
cinations against viruses, most cancers coexist 
and coevolve with our immune systems over 
years, resulting in an immunosuppressive 
tumour microenvironment that adds an extra 
obstacle for immunotherapy. 

In vaccine approaches for a range of dis- 
eases, specialized antigen-presenting cells have 
a pivotal role. Dendritic cells in particular are 
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extremely well suited to handling and present- 
ing antigens to activate T cells. Cultured den- 
dritic cells that have been loaded with antigens 
in vitro can boost immunity when given to 
patients with cancer, but up to now the clinical 
efficacy of this strategy has been limited’. Most 
of these vaccines use dendritic cells that have 
been derived in vitro from white blood cells 
called monocytes. Ex vivo activation of differ- 
ent dendritic-cell subsets that naturally circu- 
late in the blood has also been investigated, 
using several types of dendritic cell including 
plasmacytoid dendritic cells, which produce 
high levels of the immune-response protein 
interferon-a (IFNa) upon viral infection‘. 

Immunologists have also explored vaccines 
aimed at directly activating the patient’s own 
dendritic cells in vivo, which avoids laborious 
and expensive in vitro culture’. Such a vaccine 
requires at least three components: an ‘address 
label’ (a dendritic-cell-specific antibody or 
ligand molecule such as a carbohydrate)** 
that targets the dendritic cell; a tumour anti- 
gen; and a compound that readies the dendritic 
cells to fully activate T cells (usually a ligand 
for a Toll-like receptor (TLR)). Nanoparticles 
containing antigen and TLR ligands, along 
with targeting antibodies or other ligands, 
have proved effective in animal models’, 
and initial clinical trials using conjugates of 
dendritic-cell-targeting antibodies bound to 
a tumour antigen are under way (see ref. 10 for 
examples). 

Kranz et al. have developed a different type 
of nanoparticle vaccine that does not require 
antibodies or ligands to target the dendritic 
cells. Instead, they made nanoparticles consist- 
ing of RNA-lipid complexes’. They first dem- 
onstrated that, by making the nanoparticles 
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of an initial wave of interferon protein that helps to prime the first steps of 
T-cell activation. b, Translating the RNA within the nanoparticles, the mature 
dendritic cells express tumour antigens and present them to the T cells. 
Nanoparticle uptake by macrophages leads to a second wave of interferon 
release, which fully primes the T cells against specific antigens. c, The primed 
T cells then attack tumour cells. 
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slightly negatively charged by manipulating 
the RNA-to-lipid ratio, the particles can be 
directed to dendritic-cell-containing com- 
partments in the spleen and other lymphoid 
tissues when intravenously injected into mice. 
By using nanoparticles that carried RNA 
encoding a fluorescent protein, the authors 
observed that the distribution within the body 
was more dependent on the overall charge of 
the nanoparticle than on the type of lipid used. 
Fluorescence was observed in antigen-present- 
ing dendritic cells and in macrophages, another 
type of antigen-presenting cell (both of which 
express the molecular marker CD11c) in the 
marginal zone of the spleen and in other lym- 
phoid organs. Fluorescence was not observed 
in mice depleted of CD11c-expressing cells. 
Plasmacytoid dendritic cells did not fluoresce 
but showed other signalling responses that 
suggest that they have taken up nanoparticles. 

The researchers found that uptake of the 
nanoparticle RNA occurred by a cell-mem- 
brane-based process called micropinocytosis. 
Uptake was highest in macrophages. How- 
ever, the highest expression of RNA-encoded 
fluorescent marker was observed in dendritic 
cells, indicating that they are more effective 
than macrophages at enabling the ingested 
RNA to reach the cytoplasm and be translated 
into protein. 

Intriguingly, the authors observed two 
transient waves of IFNa after nanoparticle 
injection (Fig. 1): the first was produced by 
plasmacytoid dendritic cells and peaked at 
2-3 hours after injection; it was followed by a 
macrophage-produced wave around 6-8 hours 
later. By testing an array of genetically modified 
mice, the authors show that IFNa secretion is 
mediated by the receptor TLR and that the first 
wave is necessary for precursor dendritic cells 
to mature and migrate to encounter T cells in 
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the spleen and lymph nodes. This leads to a 
full-blown T-cell response (helped by the sec- 
ond wave of IFNa secretion) against a range of 
antigens in tumour models in mice, generating 
robust and long-lasting antitumour responses. 

Kranz et al. extended their research to an 
initial clinical study in patients with melanoma, 
using nanoparticles carrying RNA encoding 
tumour antigens, and present results from 
the first three patients treated. Impressively, 
immune responses were observed — although 
it is still early days, and a larger, randomized 
trial will be needed to validate these findings. 
All three patients produced IFNa and devel- 
oped strong T-cell responses against the immu- 
nizing antigens, even though a smaller dose 
of nanoparticles was used than in the mouse 
studies. The T-cell responses involved both 
CD4-type and CD8-type T cells; the activation 
of cytotoxic CD8 T cells is typical of an anti- 
viral response, and having both types of T-cell 
response usually improves anticancer action. 

The authors used intravenous injection to 
deliver the nanoparticles, but it would be inter- 
esting to explore other administration routes, 
which might alter their distribution. It would 
be worth examining the tissue distribution 
of radiolabelled nanoparticles in humans, as 
in the mouse experiments, to see if they also 
mainly target CD11c-expressing cells. Other 
immune-system cells that are marked by 
CD1Ic, such as neutrophils and monocytes, 
also have a high phagocytic capacity, and 
might therefore be able to take up nanoparticle 
materials and become activated. If so, it is not 
clear what contribution these other cell types 
might make to producing immune-system 
signals such as cytokines. 

Kranz and colleagues’ study highlights 
the role of IFNa in obtaining robust T-cell 
responses against tumours. Notably, CD8 (as 
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well as CD4) T-cell responses were observed in 
both the mouse and human studies. Although 
CD8 T cells have long been known as the 
major class of immune cells acting in tumour 
eradication, different subtypes of dendritic 
cells stimulate different types of CD8 T cells’, 
and the contribution of CD4 T cells may have 
been underestimated. The responses in the 
three cancer patients are interesting given the 
different types of tumour antigen that were 
explored (including antigens that are not 
usually expressed in adult tissue and new 
antigens that arose owing to mutation within 
the tumour cells). This nanomedicine platform 
may give a strong boost to the vaccine field, 
and the results of forthcoming clinical studies 
will be of great interest. m 
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Predictions of pinning 


A multiscale model has been implemented that provides accurate predictions of 
the behaviour of ferroelectric materials in electric fields, and might aid efforts to 
design devices such as sensors and digital memory. SEE LETTER P.360 


PATRYCJA PARUCH & PHILIPPE GHOSEZ 


otter temperatures are probably the 
H last thing that you would want when 

climbing out of a deep valley on a 
summer hike in the mountains. However, 
when physical interfaces — such as propagat- 
ing cracks, or the edges of moving domains 
in certain materials — become pinned bya 
potential-energy minimum, higher tempera- 
tures that amplify thermal fluctuations might 
be exactly what is needed to activate slow, 
highly nonlinear motion over the restric- 
tive energy barriers’. These barriers can be 
randomly distributed when associated with 
defects and disorder in a system, or periodic 
when arising from potential-energy variations 
that occur across the atomic planes of crystals, 
past which certain types of interface move 


sequentially”. On page 360, Liu et al.’ explore 
this second scenario for the movement of 
electrically polarized domains in ferroelectric 
materials, which are used in many optical 
devices, sensors and digital memory. The study 
provides an insightful description of domain 
motion in relation to the microscopic physics 
that underlies the properties of ferroelectric 
systems, and is generalizable to many differ- 
ent domain-wall geometries. 

Ferroelectric materials are characterized by 
spontaneous electric polarization that can be 
reversed by an external electric field. Polariza- 
tion can develop along two or more directions 
depending on the crystal symmetry of the 
material. Regions that have different direc- 
tions of polarization are called ferroelectric 
domains, and typically coexist in any ferro- 
electric sample, separated by thin interfaces 
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known as domain walls (Fig. 1a). 

The domain structure and its evolution in 
an external electric field are intimately linked 
to the dynamics of polarization reversal. When 
an electric field is applied to a sample, polariza- 
tion aligned with the field direction becomes 
energetically most favourable. Domains with 
this orientation therefore grow at the expense 
of their neighbours, either through the for- 
mation and growth of new domains, or by the 
motion of existing domain walls (Fig. 1b). 

Domain-wall motion is a particularly inter- 
esting model system for the theoretical study 
of pinned elastic interfaces, which exhibit 
complex dynamic behaviours* due to the 
interplay between elastic behaviour (which 
tends to maintain a flat configuration) and the 
effects of local potential-energy variations. For 
ferroelectric domains at zero kelvin, no motion 
would occur until the electric field reached a 
critical value. At finite temperatures, however, 
thermal activation allows a highly nonlinear 
dynamic response that depends on the dimen- 
sionality of the system and the type of pinning, 
even for fields well below the critical value”®. 
Understanding the dynamics of this domain- 
wall motion is not only of academic interest, 
but is also crucial for technological applica- 
tions of ferroelectric materials. 

However, obtaining a general, widely 


Figure 1 | Modelling domain motion in ferroelectric materials. 

a, Ferroelectric materials consist of domains that are electrically polarized 
in different directions (arrows) and separated by domain walls. b, When 

an electric field is applied to a ferroelectric material, domains polarized in 
the direction of the field grow at the expense of others (P, polarization). The 
square ‘nucleus’ represents the initial growth of the red domain into the 
blue domain. c, Liu et al.’ report a theoretical description of domain-wall 


Zoom out 
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motion that depends on the potential-energy variation associated with the 
crystal lattices of ferroelectric materials. The method starts by calculating 
the structure and energetics of a few hundred atoms at the atomic scale 

and quantum level, then progressively ‘zooms out’ to a larger scale (up to 
850,000 atoms; not shown), before finally deriving an analytical description 
of the bulk material that internalizes key parameters derived in the first two 
steps, but does not explicitly consider the microscopic details. 
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applicable description is complicated by 
the wide diversity of ferroelectric materials, 
from almost-perfect single crystals, through 
mono- and polycrystalline thin films that 
generally have higher densities of defects 
than single crystals, to ceramics, which can 
be porous and have multiple grain boundaries 
(interfaces between the microscopic crystals 
that make up the bulk material). If defects 
dominate domain-wall pinning, then highly 
variable, sample-specific motion is expected. 

Nonetheless, all of these materials have 
something in common: the intrinsic micro- 
scopic variation of potential energy across 
the atomic planes of their crystal lattices acts 
as a periodic pinning potential that restricts 
domain-wall motion. Liu and colleagues focus 
on this simple universal feature, and show that, 
in many cases, a theoretical description of its 
effects provides surprisingly accurate pre- 
dictions of the macroscopic, technologically 
relevant behaviour of real samples. 

To achieve this, the authors implemented a 
multiscale theoretical approach that progres- 
sively internalizes the microscopic electronic 
and ionic degrees of freedom of ferroelectric 
systems. They started from first-principles 
calculations based on quantum mechan- 
ics, which provided key information about 
the structure and energetics of model ferro- 
electric materials, but which were limited 
by computational resources to encompass a 
few hundred atoms at zero kelvin (Fig. 1c). 
Using the first-principles results, they built 
model interatomic potentials that allowed 
them to study much larger systems — of up to 
845,000 atoms — and to explore the motion 
of a field-driven domain wall at finite tem- 
peratures using classical molecular-dynamics 
simulations. 

The data acquired from these two scales 
served as ingredients for an analytical, 
phenomenological model of domain-wall 
motion in the bulk material, considered as a 
continuous medium that does not explicitly 
take into account the atomic structure. The 
authors also explain how their approach can 
be ingeniously generalized to domain walls in 
crystals of different symmetries in both purely 
ferroelectric and ferroelectric—ferroelastic sys- 
tems (which exhibit spontaneous deformation 
as well as electrical polarization). 

Liu and colleagues’ ab initio, multiscale 
approach does not rely on empirical para- 
meters, and thus allows several assumptions 
in existing theories of domain-wall motion to 
be re-examined — for instance, by allowing a 
more realistic understanding of the shape of 
critical nuclei (the initial regions of polari- 
zation that form and grow into new ferro- 
electric domains; Fig. 1b) — and corrected as 
necessary. Their work demonstrates that zero- 
temperature microscopic quantities calculated 
from first principles are relevant to descrip- 
tions of complex macroscopic phenomena 
at finite temperatures, and it provides a 
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concrete process for the rapid calculation of the 
latter from the former. This process could be 
integrated into high-throughput ab initio 
platforms for material design and is a new tool 
for optimizing ferroelectrics’. 

Although the model does not consider 
disorder or defects in materials, in many cases 
its predictions of coercive fields (the electric 
fields needed to reverse the polarization) are 
in excellent agreement with experimental 
measurements*, emphasizing the key role 
of periodic crystal pinning in ferroelectric 
systems. Moreover, even for studies of ferro- 
electric domain walls in which periodic 
pinning is clearly insufficient to describe 
domain-wall behaviour’, Liu and co-workers’ 
results will be useful for comparing the relative 
energies needed for motion of different types 
of domain wall, and determining whether 
switching is purely ferroelectric, or proceeds 
by linked, ferroelastic steps. 

Finally, it should be possible to use other 
microscopic periodic pinning potentials 
— such as those artificially induced during 
the growth of ferroelectric materials on spe- 
cially prepared substrates — in the molecu- 
lar-dynamics simulations used to generate 
the analytical models, and in the analytical 
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models themselves. This could be used to 
engineer domain-wall pinning sites for future 
nanoelectronics applications”’. m 
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Brain versus brawn 


The mechanisms that underlie enforced transitions between mature cell lineages 
are poorly understood. Profiling single skin cells that are induced to become 
neurons reveals that, unexpectedly, they often become muscle. SEE LETTER P.391 


BRUNO DI STEFANO & 
KONRAD HOCHEDLINGER 


ifferentiated cells maintain their 
D identity after development, ensuring 
specialized tissue function through- 
out adult life. However, they can be experi- 
mentally forced to change their identity — for 
instance, skin cells called fibroblasts can be 
reprogrammed to become more-primitive, 
embryonic-like stem cells’, or transdifferenti- 
ated into other specialized cell types such as 
muscle’, blood? or neural cells’. These tech- 
niques are invaluable for studying cell plas- 
ticity and hold promise as possible tools for 
treating degenerative diseases, but the process 
is typically slow and inefficient. On page 391, 
Treutlein et al.° address these shortcomings 
by presenting a molecular road map of fibro- 
blasts as they convert to neurons, and provide 
intriguing evidence that such neuronal trans- 
differentiation often entails an unanticipated 
tug-of-war between alternative outcomes. 
Reprogramming and transdifferentiation 
experiments typically involve overexpression 
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of regulatory transcription factors that bind 
to DNA and induce gene-expression patterns 
characteristic of a specific cell type®. One of 
the groups that carried out the current study 
previously identified* three brain-specific 
transcription factors — Brn2, Ascll and Myt1l, 
collectively known as BAM factors — whose 
overexpression in vitro converts fibroblasts 
into cells that resemble brain-derived neurons. 
This group also demonstrated’ that overex- 
pression of Ascl1 alone can induce transdif- 
ferentiation into neuron-like cells, albeit with 
lower efficiency than the BAM cocktail. With 
either approach, most cells resist transdifferen- 
tiation, but the reasons for this remain unclear. 

Single-cell RNA sequencing is a useful tool 
for assessing global gene-expression patterns 
in rare cell types within mixed cell popula- 
tions®. Treutlein et al. apply this technology 
to neuronal transdifferentiation mediated by 
either BAM or Ascl1 alone. They measure total 
RNA levels in 405 single cells after 0, 2, 5 and 
22 days of culture. They then use sophisticated 
computational tools to visualize data obtained 
from the cultures as a two-dimensional 
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Brain versus brawn 


The mechanisms that underlie enforced transitions between mature cell lineages 
are poorly understood. Profiling single skin cells that are induced to become 
neurons reveals that, unexpectedly, they often become muscle. 


BRUNO DI STEFANO & 
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ifferentiated cells maintain their 
D identity after development, ensuring 
specialized tissue function throughout 
adult life. However, they can be experimentally 
forced to change their identity — for instance, 
skin cells called fibroblasts can be repro- 
grammed to become more-primitive, embry- 
onic-like stem cells’, or transdifferentiated into 
other specialized cell types such as muscle’, 
blood’ or neural cells*. These techniques are 
invaluable for studying cell plasticity and hold 
promise as possible tools for treating degenera- 
tive diseases, but the process is typically slow 
and inefficient. In a paper online in Nature, 
Treutlein et al.” address these shortcomings 
by presenting a molecular road map of fibro- 
blasts as they convert to neurons, and provide 
intriguing evidence that such neuronal trans- 
differentiation often entails an unanticipated 
tug-of-war between alternative outcomes. 
Reprogramming and transdifferentiation 
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experiments typically involve overexpression 
of regulatory transcription factors that bind 
to DNA and induce gene-expression patterns 
characteristic of a specific cell type®. One of 
the groups that carried out the current study 
previously identified* three brain-specific 
transcription factors — Brn2, Ascl1 and Myt1l, 
collectively known as BAM factors — whose 
overexpression in vitro converts fibroblasts 
into cells that resemble brain-derived neurons. 
This group also demonstrated’ that overex- 
pression of Ascl1 alone can induce transdif- 
ferentiation into neuron-like cells, albeit with 
lower efficiency than the BAM cocktail. With 
either approach, most cells resist transdifferen- 
tiation, but the reasons for this remain unclear. 

Single-cell RNA sequencing is a useful tool 
for assessing global gene-expression patterns 
in rare cell types within mixed cell popula- 
tions®. Treutlein et al. apply this technology 
to neuronal transdifferentiation mediated by 
either BAM or Ascll alone. They measure total 
RNA levels in 405 single cells after 0, 2, 5 and 
22 days of culture. They then use sophisticated 
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Figure 1 | A tug-of-war between cell types. Expression of the brain-specific transcription factors Brn2, 
Ascl1 and Myt11 in skin cells called fibroblasts triggers the cells’ direct conversion into neurons. Treutlein 
et al.° discover that this neuronal transdifferentiation involves two waves of transcriptional change, 
initiation and maturation. Most cells progress through initiation and take on an intermediate identity 

ina process orchestrated by Ascl1. During maturation, only a few cells turn on genes associated with 
mature neurons. The authors find that the three factors promote the acquisition of a proper neural fate in 
such cells. Brn2 and Myt1I seem to actively prevent transdifferentiation towards muscle — a fate that is 


promoted by Ascll. 
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computational tools to visualize data obtained 
from the cultures as a two-dimensional repre- 
sentation and thus reconstruct the progression 
of the fibroblasts into neurons over time. 

These experiments reveal that transdif- 
ferentiating fibroblasts transit through two 
discernible stages, which the authors dub ini- 
tiation and maturation (Fig. 1). During initia- 
tion, the cells cease to express genes that are 
characteristic of fibroblasts, stop proliferating 
and transiently activate genes whose expres- 
sion marks neuronal progenitor cells. These 
changes take place in most cells and are orches- 
trated by Ascl1. By contrast, only a subset of 
cells progresses to maturation. This phase is 
characterized by the activation of genes that 
establish, and subsequently maintain, a mature 
neuronal lineage, and involves all three BAM 
factors. Thus, the transition from initiation 
to maturation represents a bottleneck during 
fibroblast-to-neuron conversion, correlating 
with the low efficiency of transdifferentiation. 

One plausible explanation for this bottle- 
neck is that rare cell types in the mixed cultures 
are uniquely susceptible to transdifferen- 
tiation. This would predict the existence of a 
transcriptionally distinct subset of fibroblasts. 
However, when examining gene expression in 
73 individual fibroblasts, Treutlein et al. find 
no evidence for distinct subsets, and cultures 
seem remarkably homogeneous, making this 
possibility unlikely. 

Another possibility is that the viral vector 
used by the authors to deliver the transcrip- 
tion factors into cells is silenced. To address 
this point, Treutlein et al. compare the expres- 
sion of the introduced Ascl1 transgene in doz- 
ens of single cells. Although Ascl1 is initially 
expressed in most fibroblasts, the transgene 
is frequently silenced as cells transition to the 
maturation phase. These data suggest that viral 
silencing accounts, at least in part, for the low 
efficiency of neuronal transdifferentiation. 

Perhaps the most surprising result emerges 
from a comparison of gene-expression data in 
single cells that activate the neuronal marker 
gene Tau after receiving either Ascl1 alone or 
the BAM factors. This analysis reveals that cells 
that receive only Ascl1 adopt a muscle-like 
(myogenic) gene-expression program despite 
activating Tau, whereas Tau-expressing cells 
that received BAM assume the expected neu- 
ronal fate. Unfortunately, the frequency with 
which Ascll expression alone gives rise to bona 
fide neurons and the functionality of the myo- 
genic cells remain unclear. Nonetheless, these 
data suggest that Myt1l and Brn2 not only pro- 
mote neuronal identity, but also prevent acqui- 
sition of a competing myogenic fate (Fig. 1). 

Treutlein and colleagues’ study underscores 
the power of single-cell technology to decon- 
struct complex cell-fate transitions across 
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different lineages and time. Key questions that 
remain are why and how the Ascl1 transgene 
is silenced in fibroblasts, and whether an 
understanding of this inhibition would help 
to reveal the mechanisms that normally safe- 
guard cell identity. Of note, recent data”’® 
suggest that the mechanisms that facilitate 
viral-transgene silencing also act as barriers 
to reprogramming and transdifferentiation, 
suggesting a functional connection between 
these processes. 

Another unresolved question is how Ascl1l 
supports transdifferentiation into myogenic 
cells, given that it is not normally expressed 
in muscle. Ascl1 belongs to a class of ‘pioneer 
factors’ that can associate with and regulate 
dormant regions of the genome that are inac- 
cessible to other transcription factors. As such, 
its forced expression in fibroblasts might trig- 
ger nonspecific binding to hundreds of DNA 
sequences, including inactive muscle genes. 
Alternatively, Ascll might partner with another 
transcription factor that redirects it to muscle 
genes. A potential candidate partner is the pro- 
tein MEF2, which is highly expressed both in 
fibroblasts and during muscle development, 
and has been shown” to interact with Ascll. 
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Regardless of the mechanisms that give rise 
to myogenic cells, Treutlein and colleagues’ 
results have several practical implications. 
For example, the data suggest that identify- 
ing transdifferentiated neurons using a single 
marker such as Tau is insufficient, because 
the gene is also activated in cells that acquire 
a myogenic fate. The authors’ findings further 
imply that the general trend of reducing the 
set of transdifferentiation-inducing tran- 
scriptional regulators to aminimum, as was 
the case here and in previous studies, may have 
unintended consequences. 

This is particularly relevant in a therapeu- 
tic setting, in which it is crucial to transplant 
well-defined, homogeneous cell populations. 
A case in pointis the finding” that liver-like 
cells transdifferentiated from fibroblasts seem 
to be more similar to progenitors of the intes- 
tinal lining than to liver cells, and accordingly 
give rise to functional intestinal cells in mice. 
It is tempting to speculate that removing, add- 
ing or exchanging transcription factors from 
other transdifferentiation cocktails, combined 
with a more in-depth molecular analysis of the 
generated cell types, might uncover other 
similar shifts in cell fate or maturity. m 
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Stem cell function and stress response 
are controlled by protein synthesis 


Sandra Blanco!, Roberto Bandiera!, Martyna Popis!, Shobbir Hussain2, Patrick Lombard!, Jelena Aleksic!, Abdulrahim Sajini', 
Hinal Tanna’, Rosana Cortés-Garrido!, Nikoletta Gkatza!, Sabine Dietmann! & Michaela Frye! 


Whether protein synthesis and cellular stress response pathways interact to control stem cell function is currently 
unknown. Here we show that mouse skin stem cells synthesize less protein than their immediate progenitors in vivo, even 
when forced to proliferate. Our analyses reveal that activation of stress response pathways drives both a global reduction 
of protein synthesis and altered translational programmes that together promote stem cell functions and tumorigenesis. 
Mechanistically, we show that inhibition of post-transcriptional cytosine-5 methylation locks tumour-initiating cells 
in this distinct translational inhibition programme. Paradoxically, this inhibition renders stem cells hypersensitive to 
cytotoxic stress, as tumour regeneration after treatment with 5-fluorouracil is blocked. Thus, stem cells must revoke 
translation inhibition pathways to regenerate a tissue or tumour. 


Protein synthesis is a fundamental process for all cells, but its precise 
regulatory roles in development, stem cells and cancer are not well 
understood. We recently identified post-transcriptional methylation 
of transfer RNA (tRNA) at cytosine-5 (m°C) by NSUN2 as a novel 
mechanism to repress global protein synthesis'. Loss of Nsun2 causes 
hypomethylation of tRNAs, allowing endonucleolytic cleavage by angi- 
ogenin and accumulation of 5’ tRNA fragments’. These fragments 
repress cap-dependent protein translation*’. 

Correct RNA methylation is essential for development and tis- 
sue homeostasis. Loss-of-function mutations in human NSUN2 
cause growth retardation and neurodevelopmental defects including 
microcephaly'*"!°. In mouse, Nsun2-associated microcephaly can be 
rescued by inhibiting angiogenin-mediated tRNA cleavage’. In adult 
tissues (testis and skin), NSUN2 is only expressed in a subpopulation 
of committed progenitors, in which its activity balances self-renewal 
and differentiation!) !”. 

Here, we reveal that the interplay between RNA methylation and 
translation shapes stem cell fate. Using skin as a model, we demon- 
strate that stem cells have lower protein synthesis than committed cells 
in both homeostasis and tumorigenesis. Low translation function- 
ally contributes to maintaining stem cells, and is not merely a conse- 
quence of quiescence or cell cycle state. By genetically deleting Nsun2 
in a tumour mouse model, we find that protein synthesis is globally 
repressed; however, distinct transcripts escape this repression and 
establish a translational programme crucial to stimulate stem cell func- 
tions. Unexpectedly, the selective alteration of translation is remarka- 
bly effective in rendering stem cells sensitive to cytotoxic stress. 


Protein synthesis is low in stem cells 
In skin, the best-characterized stem cell populations reside in the hair 
follicle’. Hair follicle stem cells (HFSCs) are periodically activated 
at the onset of hair growth (anagen), which is followed by phases of 
regression (catagen) and rest (telogen) (Extended Data Fig. 1a)!*°. 
HFSCs located in the bulge express the stem cell markers CD34, K19 
(also known as KRT19) and LGRS (Fig. 1a)! 

To visualize HFSCs and their progeny, we genetically labelled K19- 
and LGR5-expressing bulge stem cells with a tdTomato (tdTom) 
reporter (Fig. 1a, b and Extended Data Fig. 1a)'®!®. To measure global 


protein synthesis we quantified incorporation of O-propargyl- 
puromycin (OP-puro) into nascent proteins (Fig. 1b)!”. Protein syn- 
thesis was uniformly low in the interfollicular epidermis, but highly 
dynamic in hair follicles throughout the hair cycle (Extended Data 
Fig. 1b). In telogen, highly translating cells at the follicle base were not 
stem cells, as they were negative for tdTom (Fig. 1c, d and Extended 
Data Fig. 1c). In late anagen, OP-puro co-localized with tdTom in com- 
mitted progenitors located in the hair bulb (Fig. le, f and Extended 
Data Fig. 1d, arrows). The highest translation was displayed above the 
hair matrix, which contains committed progenitors that divide a finite 
number of times before differentiating (Fig. le, fand Extended Data 
Fig. 1d, arrowheads)”. 

Co-labelling of OP-puro with markers for all hair lineages identi- 
fied the Henle’s and Huxley’s layers of the inner root sheath (IRS) as 
the lineages with highest translation (Fig. 1g—k and Extended Data 
Fig. le, f)?”*. Both IRS layers exclusively contain committed and dif- 
ferentiated cells””. 

To quantify protein synthesis fully in distinct epidermal popula- 
tions, we flow-sorted bulge stem cells (CD34*/ITGA6*), non-bulge 
cells (CD34 /ITGA6*), and differentiated cells (CD34 /ITGA6_ ) 
(Fig. 2a-c)!’. To capture epidermal cells giving rise to the highly 
translating IRS, we enriched for OP-puro™®" cells (top 2.5% in rate of 
translation) (Fig. 2b). The selection for high translation did not perturb 
the proportion of cell populations found in the epidermis (Extended 
Data Fig. 2a—d). Quantification of OP-puro incorporation confirmed 
that protein synthesis was highest in differentiated populations in late 
anagen (Fig. 2d). Translation in bulge stem cells significantly increased 
from telogen to anagen (Fig. 2d), suggesting a correlation between 
translation rate and stem cell activation. 

Next, we focused on HFSCs and their progeny and quantified protein 
translation in tdTom™ cells that were sorted into bulge stem cells, non- 
bulge cells, and differentiating cells (Fig. 2e, f). Translation rates signif- 
icantly increased in bulge HFSCs from telogen to anagen (Fig. 2e, f). 
In addition, the average translation rate increased in differentiating 
cells in late anagen, and was around twofold higher compared to the 
background cells (tdTom_ ) (Fig. 2d-f and Extended Data Fig. 2e, f). 
These results were robust to the specific threshold used to identify cells 
as highly translating (top 2.5-50%) (Extended Data Fig. 3a-c). 
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Biology & Biochemistry, University of Bath, Claverton Down, Bath BA2 7AY, UK. University of Cambridge, CR-UK, Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 ORE, UK. 
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Figure 1 | HFSCs synthesize less protein than their progeny. 

a, Epidermal populations analysed. BG, bulge; DP, dermal papilla; 

HG, hair germ; IFE, interfollicular epidermis; SG, sebaceous gland. 

b, Treatment regimes. P, postnatal day. c-f, Detection of tdTom and 
OP-puro in back skin of K19/LGR5Cre tdTom mice in telogen (c, d) 

and late anagen (e, f). Arrows indicate tdTom* cells (magnification, 
bottom panels). Arrowheads indicate tdTom*/OP-puro"® cells. Dotted 
line indicates lower bulge. g-j, OP-puro and hair follicle lineage marker 
detection (late anagen). Dotted lines indicate cross-section (1, 2). 

k, Schematic summary of g-j. Ch, cuticle; Ci, cuticle of inner root sheet; 
Co, cortex; Cp, companion layer; He, Henle’s layer; Hu, Huxley’s layer; Me, 
medulla; ORS, outer root sheet. OP-puro* layers (green). Scale bars, 501m. 
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Thus, as stem cells proceed into a fully committed progenitor state, 
protein translation steadily increases. 


Proliferation does not dictate translation 

Protein synthesis was highest in growing hair follicles. However, 
cellular division alone did not explain translation rates as the greatest 
protein synthesis was found in differentiating but non-dividing 
(Ki67~) cells (Fig. 2g). Although the percentage of cycling (S/G2/M) 
cells correlated with increasing translation rates (Extended Data 
Fig. 3d, e), differentiating (CD34 /ITGA6_) and non-dividing 
(G1/G0) cells represented the population with the highest translation 
(Extended Data Fig. 3f, g). 

To test directly whether protein synthesis was determined by 
lineage commitment instead, we measured the translation rate in 
bulge HFSC and their offspring (tdTom*) along the cell cycle. In late 
anagen, non-cycling (G1/G0) tdTom* cells synthesized significantly 
more protein than their cycling (S/G2/M) counterparts (Fig. 2h and 
Extended Data Fig. 3h). Thus, increasing translation rates correlated 
with stem cell commitment and differentiation rather than proliferation 
(Extended Data Fig. 4p). 


Low translation in tumour- initiating cells 
To test whether low protein synthesis simply reflected a quiescent state, 
we investigated translation rates in cancer-initiating cells, which exhibit 
both high self-renewal and proliferation capacity. We used K5-Sos mice, 
which constitutively activate RAS in basal epidermal cells and develop 
well-differentiated tumours resembling human squamous tumours”>”*. 
Undifferentiated progenitors expressed markers for tumorigenesis 
and tumour-initiating cells (ITGB1, ITGA6, CD44, CD34, PDPN)**”? 
and exhibited lower protein synthesis than committed progenitors 
(Fig. 3a, d and Extended Data Fig. 4a-c, f-j). Translation was highest 
in suprabasal and differentiating committed progenitors (K10*), but 
absent in terminally differentiated, non-tumorigenic cells (Fig. 3a, b). 
In cancer, elevated translation has been associated with increased 
proliferation”. However, in our data, high translation was uncoupled 
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Figure 2 | Protein synthesis correlates with differentiation. 

a-c, Experimental set up. d-f, Violin plots of normalized protein synthesis in 
OP-purob" cells sorted for the indicated epidermal populations (c). 

g, Ki67 and OP-puro detection (late anagen). Arrowheads indicate 
Ki67~/OP-puro* cells. Scale bar, 50jm. h, Box plots of protein synthesis in 
cycling (S/G2/M) and non-dividing (G1/G0) OP-puro*®" cells. n = mice. 

*P< 0.05, **P<0.01, ***P< 0.001, ****P < 0.0001 (two-tailed Student’s 
t-test). Source Data for this figure is available in the online version of the paper. 


from proliferation because both OP-puro"" and OP-puro! cells 
expressed Ki67 (Fig. 3c), and protein synthesis did not correlate with 
cycling cells (Fig. 3e). 

Thus, similar to normal skin, stem and progenitor cells in tumours 
produced less protein than their committed progeny. 


Low translation maintains tumour stem cells 

To test whether low translation is a cause or a consequence of a stem 
cell state requires the ability to modulate protein synthesis. An excellent 
system is the genetic deletion of the RNA-methyltransferase NSUN2. 
NSUN2 modulates global translation by protecting tRNAs from 
cleavagel. In normal skin, NSUN2 is restricted to distinct hair follicle 
populations! that overlap with OP-puro8 cells in early and late 
anagen (Extended Data Fig. 4k, 1). Nsun2 deletion delayed HFSC 
differentiation in adult'! and developing skin (Extended Data 
Fig. 4m-o). NSUN2 is upregulated in epithelial tumours and homog- 
enously expressed in mouse and human squamous cell carcinomas 
(Extended Data Fig. 5a)*!**, and its expression is restricted to highly 
translating cells in K5-Sos tumours (Fig. 3f). 

We deleted Nsun2 in K5-Sos mice, and measured OP-puro incorpo- 
ration into the tumours of the offspring. As expected, Nsun2 ablation 
reduced protein synthesis in tumours (Fig. 3g-i and Extended Data 
Fig. 4d, e). K5-Sos/Nsun2~'~ mice developed more tumours that 
appeared earlier, grew larger, and reduced their life span (Fig. 4a and 
Extended Data Fig. 5b-d). 

Nsun2~/~ tumours appeared more proliferative; however, 5-ethy- 
nyldeoxyuridine (EdU)/5-bromodeoxyuridine (BrdU) pulse-chase 
experiments revealed that high EdU incorporation reflected an 
increased undifferentiated population, but not a faster division rate 
(Fig. 4b, c and Extended Data Fig. 5e, f). Nsun2~'~ tumours were 
poorly differentiated and in a later stage of tumorigenesis, as shown 
by increased expression of stem cell and tumour progression markers 
(Fig. 4d-i and Extended Data Fig. 5f-)). 
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Figure 3 | Tumour-initiating cells synthesize less protein than their 
progeny. a—c, Co-labelling OP-puro (OP-P) with the indicated markers. 
Arrows indicate marker-positive cells. Boxed areas in the left-hand 
panels are shown magnified in the panel to the right. d, Flow cytometry 
for OP-puro incorporation. e, Percentage of dividing cells (S/G2/M) and 
normalized protein synthesis (mean + standard deviation (s.d.); n = mice). 
f, Co-staining OP-puro, NSUN2, ITGA6. Arrows indicate OP-puro*/ 
NSUN2* cells. Boxed areas in the left hand-panel are shown magnified in 
the panel to the right. gi, OP-puro detection in sections (g, h) or by flow 
cytometry (i). Boxed areas in the left-hand panels are shown magnified 
in the panel to the right. Arrowheads indicate OP-puro! cells. Nuclei are 
stained with 4’,6-diamidino-2-phenylindole (DAPI). Scale bars, 50 pm 

(a, b, ¢, f, g, left, h, left); 25 zm (g, right, h, right). Dotted line indicates 
basal membrane. All analyses are in K5-Sos tumours. WT, wild type. 
Source Data for this figure is available in the online version of the paper. 


To test for the cell-intrinsic potential to initiate tumours, we injected 
Nsun2~'~ tumour cells subcutaneously into nude mice (Extended Data 
Fig. 6a). Only Nsun2~/~ cancer cells reconstituted the original squa- 
mous tumour with high proliferative potential and elevated levels of 
ITGB1 and PDPN (Extended Data Fig. 6b-f). Thus, Nsun2 deletion 
enhances the self-renewal potential of tumour-initiating cells in a 
cell-autonomous manner. 

Furthermore, in human skin cancers, NSUN2 expression was 
inversely correlated with malignancy when we compared protein expres- 
sion levels in normal skin and cutaneous cancers of increasing tumour/ 
node/metastasis (TNM) stages (Fig. 4j and Extended Data Fig. 6g—-m). 

These results indicate that the reduction of translation rates caused 
by Nsun2 deletion increased the tumour-initiating population. 


tRNA fragments modulate translation 

A likely mechanism for the translational repression in Nsun2-deficient 
tumours was that 5’ tRNA fragments inhibit protein synthesis’. Using 
RNA- bisulfite sequencing” 3 we confirmed that in tumours, NSUN2- 
dependent methylation occurred at most tRNAs (65%), but only at a 
small proportion of messenger RNA exons (2%) and introns (Fig. 5a, b, 
Extended Data Fig. 7a—e and Supplementary Tables 1, 2)'?"*"°°. In the 
few methylated mRNAs, NSUN2-target sites were enriched close to 
transcriptional start sites, but were uncorrelated with RNA abundances 
(Extended Data Fig. 7f, g and Supplementary Tables 1, 3). In contrast, 
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Figure 4 | Nsun2 deletion promotes stem cell identity and 
tumorigenesis. a, Tumour incidence. Daggers indicate that the mice 

died. b, c, Detection (b) and quantification (c) of pulsed-chased BrdUt 
and EdU* cells in tumours (see Methods). n =5 slides x 3 mice. 

d, e, Immunostaining for ITGA6, K10 (d) and PDPN (e). Arrows 

indicate marker-positive cells. Nuclei are stained with DAPI; dotted line 
indicates basal membrane. Scale bars, 100 1m. f-i, Flow cytometry 

(f, g) and quantification (h, i) of marker-positive tumour cells. n = mice 
(mean + s.d.). j, NSUN2 protein expression in human normal skin or tumours 
(mean +s.d.). *P< 0.05, ***P< 0.001 (two-tailed Student’s t-test). Source Data 
for this figure is available in the online version of the paper. 


hypomethylation of tRNAs directly caused by loss of Nsun2 led to the 
accumulation of 5’ tRNA fragments (Fig. 5c, d, Extended Data Fig. 7i-1 
and Supplementary Tables 2, 4). 

We performed ribosome profiling to evaluate how 5’ tRNA fragments 
influenced translation in mouse tumours and patient-derived NSUN2- 
deficient fibroblasts (Extended Data Fig. 8a, b and Supplementary 
Tables 5, 6a—c). We verified the high quality of our data by testing 
for triplet periodicity of ribosomal footprints”, increased ribosomal 
density near translational start sites*!, and correlation between RNA 
expression levels and translation (Extended Data Fig. 8c-j)”. 

The distinct translational programme in Nsun2~'~ mouse tumours 
was not driven by transcriptional alteration, as the changes in protein 
synthesis caused by Nsun2 removal were decoupled from the corre- 
sponding changes in RNA expression levels (Fig. 5e and Extended Data 
Fig. 8e). The differences in translation were more likely to be caused 
by accumulated 5’ tRNA fragments than by changes in mRNA meth- 
ylation, because translation of NSUN2-methylated mRNAs remained 
unaltered (Extended Data Fig. 7h). 

In summary, the undifferentiated cellular phenotype of Nsun2~/~ 
tumours was primarily driven by translational, and not transcriptional 
changes. 


Translational signatures in NSUN2~'~ cells 
Accumulation of 5’ tRNA fragments can activate a cap-independent 


stress-response programme’; and stress stimuli can increase ribosomal 
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density in 5’ untranslated regions (UTRs)***. Consistent with 
such a stress response, 5’ UTRs in NSUN2-deficient cells showed 
increased ribosome densities (Fig. 5f, g, Extended Data Fig. 8k, 1 
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Figure 6 | Nsun2 deletion sensitizes tumour-initiating cells to cytotoxic 
stress. a-c, Tumour size (mean + standard error of the mean (s.e.m.)) 

(a) and Ki67 detection (b, c) in control (Ctr) or 5FU-treated mice. d, K10 
and ITGA6 detection in treated Nsun2~'~ tumours. Arrows indicate K10*/ 
ITGA6*-positive cells. e, Tumour size in mice treated with 5FU and/or 
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and Supplementary Tables 7-9). The increased ribosome density 
in 5’ UTRs is probably due to the occurrence of upstream open 
reading frames (uORFs)*!. Functionally, uORFs repress trans- 
lation by sequestering initiation events or facilitating downstream 
re-initiation and translation**~*’, which may explain why protein 
synthesis of the corresponding coding sequences (CDS) remained 
unaltered (Fig. 5f, g). 

Although the underlying mechanisms are unclear, differential ribo- 
some density in 5’ UTRs should alter the protein production of distinct 
genes. Indeed, transcripts with increased ribosome density in 5’ UTRs 
were linked to apoptosis, stress response, cell shape and migration 
(Fig. 5h-j). In tumours, transcripts with reduced ribosome density in 
the CDS were related to differentiation (Fig. 5h, i). Thus, the ribo- 
some profiling data correlated well with the phenotypic reduction of 
epidermal differentiation of Nsun2-deficient tumours; and the cell- 
intrinsic NSUN2-controlled translational programme(s) related to 
stress responses and cell motility was conserved between species. 

To identify the translational programme that directly depended on 
RNA methylation, we performed ribosomal profiling after rescuing 
NSUN2~'~ human fibroblasts with the wild-type or enzymatically 
dead constructs of NSUN2 (Extended Data Fig. 9a—d). Modulators 
of cell adhesion and motility represented a quarter of translational 
repressed transcripts that depended on the enzymatic activity of 
NSUN2 (Extended Data Fig. 9e—g and Supplementary Tables 10a-c). 
Consequently, motility and adhesion were reduced and differentiation 
increased in primary human keratinocytes when NSUN2 was repressed 
or enzymatic-dead versions overexpressed (Extended Data Fig. 9h-m). 

Thus, the undifferentiated stem cell state in Nsun2-deficient tumours 
was primarily driven by differential translation of proteins regulating 
cell migration, adhesion and stress responses (Extended Data Fig. 10a, b 
and Supplementary Fig. 1). 


Low translation impairs stress responses 

To test whether the stress-related programme in Nsun2~/~ tumours 
altered their sensitivity to external stress in vivo, we applied the 
cytotoxic agent 5-fluorouracil (SFU). 5FU is commonly used to treat 
squamous cell carcinomas”. While wild-type tumours only showed 
a mild reduction in growth, 5FU-treatment blocked progression of 
Nsun2~/~ tumours (Fig. 6a and Extended Data Fig. 10c, d). Nsun2-/~ 
tumour cells were unable to re-enter the cell cycle after drug treatment, 
despite induction of p53 being detectable in all samples (Fig. 6b, c and 
Extended Data Fig. 10e, f). We obtained similar results using cisplatin 
(Extended Data Fig. 10g-i). 5FU-treated Nsun2~'~ tumour cell 
layers were reduced, and the remaining ITGA6™ basal cells unusually 
co-expressed the differentiation marker K10 (Fig. 6d, arrows). Thus, 
Nsun2-deficient tumours fail to activate survival pathways in response 
to stress. 
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Finally, we asked whether the increased sensitivity to 5FU depended 
on angiogenin-mediated cleavage of non-methylated tRNAs. We 
rescued tRNA cleavage by administering the angiogenin inhibitor 
N65828. (refs 1, 50). The high toxicity of this drug combination only 
allowed treatment for up to 7 days. Nevertheless, Nsun2~'~ tumours 
failed to regress, and the survival of undifferentiated tumour-initiating 
cells (CD34*/ITGA6"®") significantly increased when they were 
exposed to both drugs (Fig. 6e, f), indicating that tRNA fragments 
reduce the survival of Nsun2~'~ tumour-initiating cells. 

In conclusion, combining cytosine-5 RNA methylation inhibitors 
with conventional chemotherapeutic agents may provide an effective 
anti-cancer strategy for solid tumours (Extended Data Fig. 10)). 


Discussion 

Similar to the haematopoietic system”, epidermal stem cells produce 
less protein than their immediate progenitors, and forced entry into the 
cell cycle is not sufficient to reverse this translation repression. Instead, 
global protein synthesis in normal and tumour cells is determined by 
lineage commitment, but not by proliferation. 

We identify RNA methylation as an important pathway to modu- 
late global protein synthesis and cell fate. Both protein synthesis and 
NSUN2 expression are low in epidermal stem cells, but increase upon 
commitment to differentiate. NSUN2-mediated methylation protects 
tRNA from cleavage into non-coding 5’ tRNA fragments, thereby pro- 
moting protein translation and differentiation'. External stress stimuli 
inhibit NSUN2 activity’, permitting cleavage into 5’ tRNA fragments, 
which then decrease protein synthesis in human cells*. Inhibition of 
post-transcriptional methylation in squamous tumours promotes stem 
cell function and tumorigenesis. However, re-activation of cytosine-5 
RNA methylation pathways is required to exit the specific transla- 
tion inhibition programme after cytotoxic stress. Thus, activation of 
RNA methylation or inhibition of tRNA cleavage is essential for cell 
survival of tumour-initiating cells in response to cytotoxic stress (see 
Supplementary Discussion). 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Transgenic mice. Rosa-CAG-LSL-tdTomato (ref. 51), K19-CreER (ref. 52) and Lgr5- 
CreERT2 (ref. 53), Nsun2~'~ (or homozygous Nsun2@(P0l4P1) Wrst) 1, and K5-Sos-F 
(in a wa2/wa2 background)’ mutant mice have been described previously. Balb/C 
athymic nude mice purchased from Charles River were used in transplantation 
experiments. All mice were housed in the Wellcome Trust—Medical Research 
Council Cambridge Stem Cell Institute Animal Unit. All mouse husbandry and 
experiments were carried out according to the local ethics committee under the 
terms of a UK Home Office license PPL80/2231 and PPL80/2619. 

To conditionally induce tdTomato-reporter lines for expression of Cre- 
recombinase, Rosa-CAG-LSL-tdTomato mice were crossed with K19-CreER or 
Lgr5-CreERT2 mice. To activate CreER, only male mice were treated with two 
intraperitoneal (i-p.) injections of 50,11 of a tamoxifen (Sigma-Aldrich) solution 
(40mg ml’) in corn oil at postnatal day 15 and 17. 

To measure protein synthesis in vivo, mice were i.p. injected with O-propargyl- 
puromycin (OP-puro) (Medchem Source LLP) at a concentration of 50 mg per kg 
of body weight dissolved in PBS at pH 6.4-6.6 1h before being killed. Skin samples 
were collected at Catagen (postnatal day (P)17), telogen (P19), early anagen (P25) 
and late anagen (P30). 

To induce squamous tumours in an Nsun2~!~ background, we used K5-Sos 

transgenic mice. These mice express a dominant-negative form of son of sevenless 
(SOS) under control of the keratin 5 (K5) promoter and develop spontaneous 
cutaneous tumours with 100% penetrance’, K5-Sos'*"?/""2 mice were crossed 
with Nsun2*’~ mice carrying a gene trap in the Nsun2 allele (Nsun20"(0014D1) Wrst), 
Spontaneous skin papillomas developed mainly in the tail of K5-Sos*"*7/* mice 
2 weeks after birth but they did not develop into malignant squamous cell carci- 
nomas™. 
Histology, tissue and cell stainings, antibodies and imaging. Tissues or tumours 
were either embedded in OCT and frozen or fixed overnight with 4% paraformal- 
dehyde, transferred to 70% EtOH and embedded in paraffin. Samples were then 
cut at 4,1m (paraffin) or 10j1m (frozen). Immunofluorescence staining, LacZ and 
haematoxylin and eosin staining of frozen or paraffin-embedded tissues or cells 
were performed as described previously". For immunohistochemistry, ImmPRESS 
reagents (Vector Labs) or IHC Detection Kit (Ventana Medical Systems) and 
DISCOVERY automated IHC staining system (Ventana Medical Systems) were 
used. 

Primary antibodies were used at the following dilutions: rabbit polyclonal to 
RFP (for tdTomato) (1:1,000; Rockland, 600-401-379), mouse monoclonal to 
DLX3 (1:200, Abnova, H00001747-A01), rabbit polyclonal to K6 (1:200, Abcam, 
ab24646), mouse monoclonal to GATA3 (1:50, Santa Cruz Biotech, sc-268), 
guinea pig polyclonal to K31 and K72 (1:200, Progen, GP-hHal and GP-K6irs2), 
rabbit polyclonal anti c-MAF (1:100, Bethyl, A300-613A), mouse monoclonal to 
LEF1 (1:50, Santa Cruz Biotech, sc-81470), goat polyclonal to P-cadherin (1:100, 
R&D systems, FAB761A), rabbit monoclonal antibody to Ki67 (1:200; SP6, 
Vector Labs, VP-RM04), mouse monoclonal anti-mouse K15 (1:1,000; ref. 54), 
8-catenin (1:200, Santa Cruz Biotech, sc-7199), rat monoclonal anti-ITGB1 (1:200, 
clone HMB1-1, BioLegend, 102203), rabbit polyclonal anti-mouse K10 (1:500; 
Covance, PRB-159P), rabbit polyclonal anti- NSUN2 (1:500; Aviva Systems Biology, 
ARP48811_P050), rabbit polyclonal anti-human NSUN2 (MetA, 1:500; ref. 31), 
rat monoclonal anti-ITGA6 (1:500; GoH3, eBioscience, 14-0495), rat monoclonal 
anti-CD44 (1:200, IM7, BioLegend, 103004), anti-mouse podoplanin (1:500, clone 
8.1.1, eBioscience, 14-5381), rat monoclonal anti-BrdU (1:100; Abcam, ab6326), 
rabbit polyclonal anti-laminin-a5 (1:100, Abcam, ab75344), mouse monoclonal 
anti-cytokeratin 8 (1:100, TROMA-I, DSHB, US), rabbit polyclonal to Slug (1:200, 
Cell Signaling, 9585P), chicken polyclonal anti-GFP (1:200, Abcam, ab13970), 
rabbit polyclonal anti-p53 (1:100, CM5, Novocastra, NCL-p53-CM5p), and rabbit 
anti human involucrin (SY5 clone, 1:200, Abcam, ab80530). Alexa Fluor 555-, 
Alexa Fluor 647- and Alexa Fluor 488-conjugated secondary antibodies (Thermo 
Fisher Scientific) were added at a dilution of 1:1,000 for 1h at room temperature. 
Apoptotic cells were visualized staining sections with DeadEnd Fluorometric 
TUNEL System (Promega) or immunostained for rabbit polyclonal anti-cleaved 
Caspase3 (1:200, Cell Signaling, 9664). Nuclei were labelled with DAPI or haema- 
toxylin. Slides were mounted in glycerol supplemented with Mowiol 4-88 mounting 
medium (Sigma-Aldrich). 

White field images were acquired using an Olympus IX80 microscope and 
a DP50 camera. Fluorescence images were acquired either on a Zeiss Axioplan 
microscope or using a confocal microscope (Leica TCS SP5) at 1,024 x 1,024 dpi 
resolution. All the images were further processed with Photoshop CS5 (Adobe) 
software. 

Isolation of mouse keratinocytes from normal skin and skin tumours. To isolate 
keratinocytes from mouse back skin, shaved skin was floated on 0.25% trypsin 
without EDTA (Thermo Fisher Scientific) for 2h at 37°C. Then the epidermis 
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was scraped off the dermis, and cells were disaggregated by gentle mincing with a 
scalpel and pipetting. For back skin in late anagen the dermis was further minced 
and digested for 30 min at 37°C in low-calcium medium containing 1.25 mg ml! 
of collagenase type I, 0.5 mg ml! of collagenase type II, 0.5mg ml! of collagenase 
type IV (all from Worthington) and 0.1 mg ml! of hyaluronidase (Sigma-Aldrich). 

To disaggregate cells from squamous tumours, the tumours were minced with 

a scalpel and incubated for 1-2h at 37°C in low-calcium medium containing 
1.25mg ml ’ of collagenase type I, 0.5 mg ml of collagenase type II, 0.5mg ml! 
of collagenase type IV (all from Worthington) and 0.1 mg ml“! of hyaluronidase 
(Sigma-Aldrich). Then pieces were further incubated for another hour with trypsin 
without EDTA (Thermo Fisher Scientific) and cells disaggregated by scraping 
with a scalpel blade. Trypsin was inactivated by washing the cell suspension with 
low-calcium media containing 10% of FBS (Thermo Fisher Scientific). 
In vivo measurement of protein synthesis by flow cytometry and microscopy. 
Quantification of protein synthesis from back skin was performed using only male 
mice. Back skin or cutaneous tumours were collected and further processed for 
flow cytometry or histology analysis (described earlier). For flow cytometry anal- 
ysis cells were dissociated as described earlier. For staining of dissociated cells or 
frozen sections, samples were first fixed with 1% paraformaldehyde in PBS for 
15min on ice. Next samples were washed in PBS, and then permeabilized in PBS 
supplemented with 3% fetal bovine serum (Sigma-Aldrich) and 0.1% saponin 
(Sigma-Aldrich) for 5 min at room temperature. To conjugate OP-puro to a fluo- 
rochrome, an azide-alkyne cycloaddition was performed using the Click-iT Cell 
Reaction Buffer Kit (Thermo Fisher Scientific) and 541M of Alexa Fluor 488 or 
Alexa Fluor 647 conjugated to azide (Thermo Fisher Scientific). After the 30-min 
reaction, the cells were washed twice in PBS with 3% fetal bovine serum and 0.1% 
saponin and then resuspended in PBS. When indicated cells were further stained 
for cell surface markers and DAPI as described later. 

To visualize protein synthesis together with antibody staining in skin or tumour 
paraffin embedded or frozen sections; frozen sections were first fixed with 1% 
paraformaldehyde in PBS for 15 and paraffin sections were first de-waxed and 
progressive rehydration sections were then blocked and stained with primary 
antibodies overnight at 4°C. Next, the sections were washed and stained with 
secondary antibodies for 1h at room temperature. After washes sections were 
stained using the Click-iT Cell Reaction Buffer Kit with Alexa Fluor-647 or -488 
azide (Thermo Fisher Scientific) as described earlier. 

Quantification of protein synthesis rates. Protein synthesis rates in specific 
cell populations were calculated by normalizing the mean of OP-puro signal of 
each population of interest to the signal of the whole epidermal or tumour cell 
preparation, using the following formula: mean OP-puropopulation of interest/Mean 
OP-puroal epidermal or tumour cells (CD117-negative; CD31-negative; CD45-negative)- The mean of 
OP-puro incorporation was averaged from several mice collected in multiple 
independent experiments. OP-puro fluorescence signal between experiments was 
calibrated by including in each run BD rainbow calibration particles 8 peaks (BD 
Bioscience). Samples from PBS-injected mice were also stained for OP-puro and 
the fluorescence signal was used to determine the background signal. 

Flow cytometry and cell cycle analysis. Flow cytometry was performed for cells 
dissociated from normal skin, skin tumours or cells growing in culture. Cell disso- 
ciation from skin or tumours was performed as described earlier. Cells in culture 
were trypsinized for 5 min before performing the staining. Analysis of specific 
epidermal or tumour populations, live cells, or fixed cells previously stained for 
OP-puro as indicated earlier were incubated in 2% of BSA with combinations of 
antibodies to the following cell-surface markers: PE-Cy7-conjugated CD117 (1:100, 
clone 2B8, BD Bioscience, 558163), PE-Cy7-conjugated CD31 (1:50, PE-CAM1, 
eBioscience, 563651), PE-Cy7-conjugated CD45 (1:100, BD Pharmingen, 552848), 
PE- or eFluor-450-conjugated ITGA6 (1:500, clone GoH3, eBioscience, 12-0495 
and 48-0495), eFluor-660- or FITC-conjugated CD34 (1:50, RAM34, eBioscience, 
50-0341 and 11-0341), biotinylated CD44 (clone IM7, BioLegend, 103004) and 
PE-conjugated PDPN (eBioscience, 12-5381-82). After incubation for 30 min 
at 4°C, cells were washed twice in PBS. Biotinylated antibodies were visualized 
by incubation with Alexa Fluor-488-conjugated streptavidin (Thermo Fisher 
Scientific) for 10 min. For cell cycle analysis cells were further stained with DAPI. 
tdTomato* cells were detected using PE-Texas Red channel. 

Cells were gated using forward versus side scatter to eliminate debris and aggre- 
gates. Surface markers CD117, CD31 and CD45 were used to gate out endothelial 
cells, melanocytes and blood cells when analysing cell preparations from skin or 
tumours. Data acquisition was performed on a BD LSRFortessa analyser (BD 
Biosciences). Data were analysed by FlowJo software. 

Measurement of tumour growth and tumour treatments. Both male and female 
mice were used in these experiments. To evaluate the effect of Nsun2 deletion on 
the formation of skin tumours we measured the presence (number of tumours), the 
growth of the tumours, the percentage of mice with tumours as well the survival 
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of the mice throughout the length of the experiment (approximately 6-8 weeks). 
To monitor tumour occurrence and growth mice were weighed, the number of all 
tumours formed all over the body were counted and the growth of each tumour 
monitored every other day from P16 (the earliest time at which K5-Sos-F mice start 
developing papillomas). Papillomas in the tail tended to fuse into one covering the 
whole tail, and therefore were counted as one tumour from the beginning of the 
experiments. Other tumours also developed in ears, mouth, back skin or feed. 
The growth of each tumour was monitored by measuring the diameter of the widest 
area of the tumour using a precision calliper allowing discriminating size modifica- 
tions >0.1 mm. When animals have to be treated with drugs, experiments started 
also at the third week of age. The end point of the experiments was determined 
by health deterioration and casualties or by the length of the treatments when 
mice were under a treatment regime. K5-Sos'*"“*/ Nsun2"’* (referred as K5-Sos/ 
Nsun2*’*) survived longer than K5-Sos'*""/+ x Nsun2~'~ (K5-Sos/Nsun2~'~) and 
K5-Sos!*"??/+ « Nsun2*!~ (K5-Sos/Nsun2*'~). All mouse tumour experiments 
were carried out according to the local ethics committee under the terms of a UK 
Home Office license PPL80/2231 and PPL80/2619. Following these regulations 
the mean diameter of a tumour should not normally exceed 1.4cm (PPL80/2619, 
19b 7). While K5-Sos/Nsun2~!~ and K5-Sos/Nsun2*'~ had to be killed before mice 
reached 2 months of age due to the size and aspect of the tumours and weight loss 
or general health deterioration due to excessive tumour burden, K5-Sos/Nsun2*!+ 
only reached the deterioration state later than 2 months of age. For the analysis 
in Fig. 4a we measured the percentage of mice with tumours for each indi- 
cated day. The average number of tumours in each mouse genotype is shown in 
Extended Data Fig. 5c. Note that data points are shorter for K5-Sos/Nsun2~'~ and 
K5-Sos/Nsun2*'~ as mice survival was shorter. The diameter of the tumours was 
normalized to the size of each mouse (body weight) in Extended Data Fig. 5b 
to eliminate genotype variance because K5-Sos/Nsun2~/~ mice are significantly 
smaller than K5-Sos/Nsun2‘'*. 

Cutaneous tumours in transgenic K5-Sos/Nsun2*/* and K5-Sos/Nsun2~/~ mice 
were topically treated with 5-Fluorouracil (SFU) (Efudix 5% Fluorouracil Cream, 
Meda Pharmaceuticals) every second day for 2 weeks. 5FU inhibits thymidylate 
synthases leading to the upregulation of p53 and cell death*’. Tumours were also 
treated with 5FU in combination with an angiogenin inhibitor (AI; N65828, NCI, 
US)! administered by ip. injections at 2mg kg"! in PBS pH 7.4 every alternative 
day to 5FU treatment. Owing to the high toxicity of the drug combination, we 
were only able to simultaneously treat with 5FU and AI for a short period of time 
(up to 7 days). Cisplatin (CDDP) (Sigma) was dissolved in PBS and injected intra- 
peritoneally at 14mg kg! every other day. All treatments started when the first 
cutaneous lesions appeared and the end point was indicated by the length of the 
treatment, after which all mice were killed. Control mice were administered PBS. 
BrdU and EdU labelling. To measure proliferation, K5-Sos mice were injected 
ip. with 50 mg of BrdU per kg of body weight, 23h later with 20mg kg”! of EdU. 
One hour later mice were killed and tumour samples were processed for histology 
as described previously. BrdU was visualized as described previously!!. EdU was 
stained with Click-iT EdU Alexa Fluor 488 Imaging Kit (Thermo Fisher Scientific). 
Images of random areas of the slide were collected using a confocal microscope 
(Leica SP5). Numbers of BrdU- and EdU-positive cells were quantified using 
Volocity software (PerkinElmer). 

Tumour graft assay. Epidermal keratinocytes from K5-Sos/Nsun2*/* and 
K5-Sos/Nsun2~'~ cutaneous tumours were isolated as described earlier. GFP- 
expressing dermal fibroblasts were isolated from healthy skin of newborn APC- 
eGFP mice. For this, skin was first incubated in a 1:1 solution of 5% dispase (BD 
Biosciences) at 37°C for 1h. The epidermis was then peeled from the dermis. 
The dermis was incubated with of 0.2% collagenase in low-calcium medium 
for 30 min at 37°C to yield a single-cell suspension. The dermis suspension was 
filtered through a 70-{1m cell strainer. 1 x 10° of viable epidermal keratinocytes 
from K5-Sos/Nsun2*!* or K5-Sos/Nsun2~'~ tumours were injected subcutane- 
ously in athymic nude mice. To allow successful engrafting, the tumour cells were 
injected with 1 x 10° of viable GFP dermal fibroblasts. The GFP-expressing dermal 
fibroblasts integrated into the dermis but failed to contribute to tumour forma- 
tion (Extended Data Fig. 6c). Experiments were done in triplicates. Nude mice 
were killed 1 month after transplantation and tumour size was measured with a 
calliper. 

RNA extraction and quantitative RT-PCR. Total RNA from mouse skin tumours 
was prepared using Trizol reagent (Thermo Fisher Scientific) and double-stranded 
cDNA was generated with Superscript III First-Strand Synthesis kit (Thermo 
Fisher Scientific). Real-time PCR amplification and analysis was conducted 
in StepOneTM Real-Time PCR Systems (Applied Biosystems) using pre- 
designed probe sets and TaqMan Fast Universal PCR Master Mix (2x) (Applied 
Biosystems). The following probes were used to amplify selected genes Nsun2 
(Mm00520224_m1), 06 integrin (Mm01333831_m1), Cd34(Mm00519283_m1), 


Krt10 (Mm03009921_m1) and involucrin (Mm00515219_s1). Gapdh expression 
(4352932E) was used to normalize samples using the AC, method. 

Mouse SCC and TMA staining and quantification. Mouse squamous cell carci- 
nomas (SCCs) were obtained from a TPA/DMBA chemical induction treatment 
for 20 weeks and frozen tissues were provided by C. Blanpain. Frozen sections were 
stained as described earlier. 

Tissue microarrays (TMAs) for human skin tumours of increased malignancy 

according to the TNM classification were purchased from Abcam (ab178287 and 
ab178288). Immunohistochemistry was performed using IHC Detection Kit 
(Ventana Medical Systems) and DISCOVERY automated IHC staining system 
(Ventana Medical Systems) with a polyclonal antibody for human NSUN2 
(MetA, 1:1,000; ref. 31). Images of each tumour section were acquired with a Zeiss 
Axioplan microscope and NSUN2 expression levels were quantified for each indi- 
vidual cell in each tissue (quantified as average between all cells in all sections) 
using CellProfiler image software. 
Cell culture, viral infections and siRNA knockdown. Four lines of human dermal 
fibroblasts were used. Two independent cell lines of NSUN2~/~ human dermal 
fibroblasts were derived from two patients and referred as NSUN2~‘~ line 1 and 
line 2 in this study, and one line of NSUN2*!~ fibroblasts was derived from the 
mother of the patients described previously’; these three lines were provided by 
J. Gleeson. NSUN2*'* human dermal fibroblasts were purchased from Thermo 
Fisher Scientific (C-013-5C) and were derived from an age- and gender-matched 
individual compared to NSUN2*’~ fibroblasts. Human fibroblasts were grown 
in MEM (Thermo Fisher Scientific) supplemented with 20% fetal bovine serum 
(FBS) as described previously’. ZHC human epidermal keratinocytes (Cellworks 
distributed, ZHC-1116) were grown in KBM-Gold medium (Lonza). All cells were 
kept in a humidified atmosphere at 37 °C and 5% COp, 

To rescue expression of NSUN2 or express NSUN2 catalytically dead mutants 

in NSUN2~‘~ fibroblasts or ZHC keratinocytes, full-length human NSUN2 
(pB-NSUN2), inactive mutants C271A (pB-NSUN2-C271A), K190M (pB-NSUN2- 
K190M) or C321A (pB-NSUN2-C321A) or the empty vector (pB-empty) were 
infected via retrovirus as described previously”. To knock down NSUN2 expres- 
sion in human keratinocytes, cells were transfected with control siRNA (AlIStars 
negative control siRNA (Qiagen, 1027292) or human NSUN2 short interfering 
(siRNA) (Flexitube siRNA Qiagen, S102655548) using Lipofectamine RNAiMax 
transfection reagent (Thermo Fisher Scientific) according to the manufacturer's 
instructions. 
Migration assays. For migration analysis in Boyden chambers, human primary 
keratinocytes were transfected with siRNAs as described earlier. Transfections 
were carried twice every 48h and migration assay was performed 24h after the 
second transfection. Cells were treated with mitomycin C for 2h to arrest cell 
cycle progression. After mitomycin C treatment cells were trypsinized and counted 
and seeded on Boyden chambers (transwell inserts of 8 1m, 24-well plates, BD 
Biosciences). 8 x 10* cells were seeded with KBM growing medium (Lonza) 
without human recombinant (hr)EGF. Media containing 10 ng ml“! of hrEGF 
(Lonza) was placed under the transwell inserts as a chemoattractant to attract cells. 
Media without chemoattractant was placed under the transwell inserts in control 
experiments. Cells were allowed to migrate for 6 or 12h, after which the inserts 
were washed once with PBS, fixed with 4% PFA for 10 min, and cells were stained 
with DAPI. Cells from the upper side of the membrane were scratched off with a 
cotton bud and washed off with PBS several times. Cells on the bottom side of the 
membrane were imaged with a colony scan microscope. Cells were then quantified 
with the software CellProfiler. 

For motility analysis of human keratinocytes, 10‘ cells were seeded in 24-well 
ImageLock plates (Essen Instruments) in growing medium and kept for 26h at 
37°C in 5% COs. Cell mobility was recorded with an automated IncuCyte micro- 
scope (Essen Instruments). Images were collected at 15-min intervals. Two- 
dimensional cell migration was analysed by using the MTracking function of 
ImageJ software. Two-dimensional migration tracks were generated by manually 
tracing the nucleus of each cell. Migrated distance was obtained by measuring 
the linear distance travelled between the first and last position (after 26h) of each 
tracked cell. 
tRNA sequencing library preparation. Small RNA libraries were generated from 
snap-frozen skin papillomas from 4-week-old mice. Four independent biologi- 
cal replicates were used. For tRNA library generation we followed the protocols 
described previously’. Briefly total RNA was extracted using Trizol reagents 
(Thermo Fisher Scientific) and treated with DNase (Turbo DNase, Thermo 
Fisher Scientific). Ribosomal RNA was removed with Ribo-zero (Epicentre, 
Illumina). The remaining RNA fraction was size-selected using MirVana Isolation 
Kit (Thermo Fisher Scientific). Using MirVana RNA purification columns with 
two sequential filtration steps with different ethanol concentrations, an RNA 
fraction highly enriched in RNA species <200 nucleotides was obtained. The 
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small RNA fraction (approximately 200 ng) was first treated with 0.1 M Tris-HCl 
pH 9.0 and 1mM EDTA for 30 min at 37 °C to de-aminoacylate mature tRNAs 
and later T4-PNK (NEB) treated to ensure phosphorylated 5’ ends and 3’ OH 
ends to proceed with RNA adaptor ligation and library preparation. tRNA librar- 
ies were generated using TruSeq Small RNA Preparation Kit (Illumina). Briefly, 
3’ adenylated and 5’ phosphorylated adapters suitable for Illumina RNA sequenc- 
ing were ligated to the small RNA fraction. RNA was reverse-transcribed at 50°C 
for 1h (SuperScript III cDNA synthesis kit, Thermo Fisher Scientific), followed by 
PCR amplification with Phusion DNA polymerase (Thermo Fisher Scientific). All 
samples were multiplexed and sequenced in HiSeq platform (Illumina). 

Bisulfite sequencing library preparation. To generate bisulfite sequencing librar- 
ies, RNA was prepared and bisulfite treated as described previously’. Briefly RNA 
was extracted with Trizol reagents (Thermo Fisher Scientific) from snap-frozen 
skin tumours from 4-week-old mice. Four independent biological replicates were 
used. Total RNA was extracted and DNase treated (Turbo DNase, Thermo Fisher 
Scientific). Samples were Ribo-zero treated (Epicentre, Illumina) to deplete them 
from rRNA. Atleast 1.5,.g of the remaining RNA fraction was bisulfite converted 
as follows: RNA was mixed in 7011 of 40% sodium bisulfite pH 5.0 and DNA pro- 
tection buffer (EpiTect Bisulphite Kit, Qiagen). The reaction mixture was incubated 
for three to four cycles of 5 min at 70°C followed by 1h at 60°C and then desalted 
with Micro Bio-spin 6 chromatography columns (Bio-Rad). RNA was desulfonated 
by adding an equal volume of 1 M Tris (pH 9.0) to the reaction mixture and incu- 
bated for 1h at 37°C, followed by ethanol precipitation. 2',3’-Cyclic phosphate and 
5/-hydroxyl termini produced during the bisulfite/desulfonation reaction were 
end-repaired with T4 PNK (New England Biolabs). About 120 ng of bisulfite- 
converted RNA was used to generate bisulfite-seq (BS-Seq) libraries. Because 
bisulfite treatment and desulfonation cleaves long RNAs into fragments of about 
100 nucleotides, we then used TruSeq Small RNA preparation kit (Illumina) to 
generate libraries suitable for Illumina sequencing as described previoulsy’. Briefly, 
RNA adapters suitable for Illumina sequencing were ligated to bisulfite-converted 
RNAs, reverse transcribed at 50°C for 1h with SuperScript II] and 2mM of each 
dNTP (SuperScript III cDNA synthesis kit, Thermo Fisher Scientific) followed 
by PCR amplification. All samples were multiplexed and sequenced on a HiSeq 
platform (Illumina). 

Preparation of Ribo-seq libraries. Two types of experiments were performed 
on mouse skin tumours (from K5-Sos mice) and human dermal fibroblasts sam- 
ples from each set of conditions: ribosomal profiling (Ribo-seq) and mRNA-seq. 
All samples were sequenced using the HiSeq platform (Illumina). A minimum of 
three replicates was performed for each sample. Dermal fibroblasts were grown 
and infected when indicated as described earlier and with the constructs indi- 
cated in each experiment. For cells or tissue collection, none were pre-treated with 
cycloheximide, however, cycloheximide was present in the following steps. Cells 
were washed with PBS (without cycloheximide) twice and lysis buffer (20 mM 
Tris-Cl (pH 7.4), 150mM NaCl, 5mM MgCh, 1 mM dithiothreitol (DTT) (Sigma), 
1% Triton X-100 (Sigma), 25 U ml! of Turbo DNase I (Thermo Fisher Scientific) 
containing 100\.g ml! of cycloheximide (Sigma) was added straight to the 
cells. Papillomas were dissected from the mice, snap frozen in liquid nitrogen 
and later homogenized in lysis buffer containing 100 1g ml“! of cycloheximide. 
Cycloheximide was added to the lysis buffer to arrest translation elongation while 
the polysome fraction was being purified and mRNA fragments were recovered. 
We then proceeded with ribosome footprint purification without snap freezing 
the lysates as indicated previously>®. All the steps for cell or tissue lysis, nucle- 
ase footprinting, polysome fractionation, mRNA footprint purification and gel 
size-selection were performed as indicated previously”. Briefly, lysates were 
further triturated by passing them ten times through a 26-G needle. Nuclei and 
debris were removed by centrifugation at 13,000 r.p.m. for 10 min. Supernatant was 
digested with RNasel (100 Ujl!, Thermo Fisher Scientific) for 45 min at room 
temperature. Digestion was blocked with SuperaseIN (Thermo Fisher Scientific) 
and lysate was layered on a 1M sucrose cushion and separated by ultracentrifu- 
gation at 45,000 r.p.m. in a 70Ti rotor for 9h at 4°C. Pellets were resuspended in 
Qiazol and RNA fragments lower than 200 nucleotides were extracted using miR- 
Neasy kit (Qiagen) followed by ethanol precipitation. Size selection of footprints 
with a length of 26-34 nucleotides was performed on 15% TBE-urea gel (Thermo 
Fisher Scientific). Footprints were 3’-dephosphorylated with T4 polynucleotide 
kinase (10 U, NEB). From this step and to prepare libraries suitable for Illumina 
sequencing we slightly modified the original protocol®®. mRNA footprints were 
then treated with Ribo-zero (Epicentre Illumina) to deplete rRNA. By using this 
extra step of rRNA depletion (together with the use of DNA oligonucleotides to 
deplete rDNA by subtractive hybridization in a later step) we were able to reduce 
rRNA contamination to only ~60% of all reads. mRNA footprints recovered from 
Ribo-zero were then prepared for Ribo-seq using TruSeq Small RNA Preparation 
Kit (Illumina). Briefly, 3’-adenylated and 5’-phosphorylated adapters suitable 
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for Illumina RNA sequencing were ligated to the recovered mRNA fragments. 
RNA was reverse transcribed at 50°C for 1h (SuperScript III cDNA synthesis kit, 
Thermo Fisher Scientific), followed by rDNA depletion by subtractive hybridiza- 
tion (as indicated in the original protocol) using oligonucleotides listed previously 
and following the protocol recommendations. Recovered complementary DNAs 
were PCR amplified with no more than 12 PCR cycles. All samples were multi- 
plexed and sequenced in HiSeq platform (Illumina). 

Preparation of mRNA-seq libraries. mRNA-seq libraries were generated from 
mouse skin tumours from 4-week-old K5-Sos/Nsun2*/+ and K5-Sos/Nsun2~/~ 
mice, from mouse healthy back skin from 3.5-4-week-old Nsun2*!* or Nsun2~/~ 
mice (without tumours) and from human dermal fibroblasts (NSUN2+/*, 
NSUN2*'~ and NSUN2~'~) growing in culture and infected when indicated. 
At least four replicates were performed for each sample. All samples were mul- 
tiplexed and sequenced using the HiSeq platform (Illumina). Total RNA was 
extracted using Trizol (Thermo Fisher Scientific) from cells in culture or snap-fro- 
zen tissues. Total RNA was DNase (Turbo DNase, Thermo Fisher Scientific) and 
Ribo-zero (Epicentre, Illumina) treated. rRNA-depleted RNA was used to generate 
mRNA-seq libraries using NEXTflex Directional RNA-seq Kit V2 (Illumina). All 
samples were multiplexed and sequenced in HiSeq platform (Illumina). 
Next-generation sequence data analyses. For all data analyses, FastQC was used 
for the initial assessment of the quality and basic processing of the reads (http:// 
www.bioinformatics.babraham.ac.uk/projects/fastqc). Sequencing adapters were 
trimmed from the 5’ and the 3’ ends of the reads using cutadapt (v.1.4.2; https:// 
pypi.python.org/pypi/cutadapt/1.4.2). 

RNA BS-seq analysis. To determine RNA methylation levels in mouse K5-Sos 
tumours, two complementary protocols for the analysis of BS-seq data were used. 
Alignment to the genome. BS-seq reads were aligned to mouse reference genome 
(GRCm38/mm10) with Bismark (http://www.bioinformatics.babraham.ac.uk/ 
projects/bismark; v.0.13.1; options: ‘directional —n 0 -1 40’). Methylation lev- 
els for all cytosines with at least coverage of >5 reads (5x coverage) in both 
K5-Sos/Nsun2*'+ and K5-Sos/Nsun2~‘~ tumour samples were inferred with 
Bismark ‘methylation_extractor. Cytosine positions displaying a difference in 
RNA methylation of at least 10% between K5-Sos/Nsun2*!* and K5-Sos/Nsun2~/~ 
tumour samples were extracted based on the ENSEMBL (GRCm38, Release 74; 
http://www.ensembl.org/info/data/ftp) transcript annotations and tRNA gene 
predictions in the mouse (GRCm38/mm10) reference genome obtained from 
GtRNAdb (http://lowelab.ucsc.edu/GtRNAdb). 

Alignment to representative transcripts. Sequences for ENSEMBL transcripts and 
tRNAs were extracted in FASTA format. All transcript isoforms were considered, 
and in addition the longest gene at full length including introns was retained as 
a representative sequence to identify RNA methylation sites in introns. Cs were 
converted to Ts in the reference transcript sequences, and in the processed BS-seq 
reads. Alignment of converted BS-seq reads against converted transcript sequences 
were performed using bowtie (v.1.1.1; http://bowtie-bio.sourceforge.net; options 
“-m 500 —v 2 -a -best -strata’). Following alignments, the reads that aligned in 
the sense direction were obtained, and the original transcript sequences and reads 
were used to compile RNA methylation (C/(C+T) levels considering only cyto- 
sines with at least 5x coverage. Heatmaps displaying either C or T in the aligned 
reads at each cytosine position were generated using custom PERL scripts and 
matrix2png (http://www.chibi.ubc.ca/matrix2png/) for visualization. Cytosine 
positions on the heatmaps were reported relative to the annotated transcriptional 
start sites of the transcripts. 

tRNA-seq data analysis. The abundance of tRNA fragments was determined 
according to a previously published protocol!. Adaptor-trimmed tRNA-seq reads 
(>20 nucleotides and <200 nucleotides in length) were mapped to the mouse 
reference genome (GRCm38/mm10) using bowtie (v.1.1.1; http://bowtie-bio. 
sourceforge.net; options “-m1 -v2 —a -best-strata’) considering only reads that 
map uniquely to the genome. To account for the polymerization of CCA 3’ ends 
onto mature tRNAs, the remaining unmapped reads were trimmed of CCA[CCA] 
ends and realigned using the same options. Annotations were conducted based 
on tRNA genes predicted for the mouse reference genome (GRCm38/mm10) 
and downloaded from GtRNAdb (http://lowelab.ucsc.edu/GtRNAdb). Reads 
that exceeded the annotated tRNA gene start or end by more than 10% were dis- 
carded. All distinct reads, which were shorter than 90% of the annotated tRNA 
gene length, were considered as tRNA fragments. Counts per fragments were nor- 
malized, and the differential abundances of fragments processed from the 5! or 3’ 
halves of the tRNAs were statistically evaluated using the R/Bioconductor DESeq 
package (http://bioconductor.org/packages/release/bioc/html/DESeq.html). tRNA 
fragment abundances are given by log2(DESeq-normalized counts). 

mRNA-seq and Ribo-seq data analyses. Ribosome profiling data was processed 
following established protocols*”. The first 5’ base of the adaptor-trimmed Ribo- 
seq reads was removed, as this is usually an artefact of reverse transcription”®. 
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For removing abundant contamination from digested rRNA present in the libraries, 
the reads were aligned to a collection of rRNA sequences obtained from GenBank 
and UCSC using bowtie (options: “-n 2-seedlen = 23’). Reads aligning to rRNA 
were discarded, with the average rRNA contamination per sample being around 
60%. Only reads with at least 24 nucleotides and less than 30 nucleotides length 
were retained in accordance with the observed length distribution of ribosome 
footprints", 

Both the Ribo-seq and mRNA-seq reads were aligned to the human (GRCh37/ 
hg19) and to the mouse (GRCm38/mm10) reference genomes using Tophat2 (v.2.1; 
options: ‘-read-mismatch 1(2)-max-multihits 1 -GTF’) guided by ENSEMBL gene 
models (release 76), allowing for two mismatches per read for human and one 
mismatch per read for mouse, and unique alignments only. 

To determine mRNA abundance, mRNA-seq read counts for the full tran- 
script were calculated using htseq-count (http://www-huber.embl.de/HTSeq/ 
doc/overview.html), data sets were normalized, and the statistical significance of 
differential expression was evaluated by using the R/Bioconductor DESeq2 package 
(https://bioconductor.org/packages/release/bioc/html/DESeq2.html). 

To evaluate differences in translation, the following additional Ribo-seq data 
analyses and normalizations were performed. 

Alignment to representative regions. Coding sequences (CDS) and 5’ UTRs were 
downloaded from ENSEMBL including ‘protein_coding’ and ‘nonsense-mediated 
decay’ types of transcripts. Intron sequences were excluded. Ribo-seq reads, which 
uniquely aligned to the genome in the initial alignment step using Tophat2, were 
aligned to 5’ UTR or CDS sequences using bowtie (options: ‘-m 1000 -v1’), 
allowing for multiple mappings to overlapping regions of the same gene. 
Statistical analysis of differential ribosome footprint densities and normalization. 
In concordance with other studies performed in yeast or mammalian cells*!**, we 
observed a characteristic 5’ ‘ramp’ of ribosome footprints at the translation start 
site of the CDS for our samples. It has been suggested that these excess footprints 
are a result of cycloheximide-inflicted accumulation of ribosomes”®. To prevent 
any artefactual bias in our analysis, we followed the instructions for normaliza- 
tion described previously*. Read counts were extracted that aligned to either 
(1) all full-length CDS (see Supplementary Tables 6-10) or (2) all CDS sequences 
without the initial 150 nucleotides (50 codons) corresponding to the ribosomal 
ramp (see Supplementary Table 5 and Fig. 5e). For both data sets, statistical tests 
were performed with the R/Bioconductor DESeq package. The two sets of DESeq 
scaling factors were subsequently used for normalization of data sets. The DESeq- 
normalized counts for all regions were divided by their length in kb to define 
ribosome footprint densities. 

Analysis of ribosome footprint densities at 5’ UTRs. Reads that mapped uniquely to 
the genome by using Tophat2 were mapped to the 5’ UTR sequences with bowtie 
(options: ‘-m 1000 -v1’). Differences in ribosome footprint densities on the full 
5! UTRs were evaluated by using the DESegq scaling factors obtained from the anal- 
ysis of CDS (Supplementary Tables 7-9) for normalization, and DESeq to perform 
statistical tests for differences. DESeq-normalized counts for 5’ UTRs were divided 
by their length in kb to define ribosome footprint densities. 

Analysis of triplet periodicity. Footprints of length 28 nucleotides were extracted, 
since they report with high precision on the position of the ribosome“. The fre- 
quencies of the 5’ starts of the footprints, which were aligned close to the annotated 
translation initiation sites, were aggregated for all genes. 

Heatmap analysis. For the heatmap analysis in Extended Data Fig. 8d, we spe- 
cifically selected the 43,625 representative and well-annotated protein-coding 
transcripts from GENECODE that overlap ENSEMBL transcript structures 
(‘ensembl_havana). The positions of the start codons were obtained from the 
ENSEMBL ‘Gene sets’ gtf file (http://www.ensembl.org/info/data/ftp/index.html). 
Heatmaps of ribosome footprint densities (RPKMs) were generated for regions 
+1,500 nucleotides around the start codon by using ngsplot (https://github.com/ 
shenlab-sinai/ngsplot). 

If not indicated otherwise, Gene Ontology categories represent GOTERM_BP_ 
FAT in DAVID (http://david.ncifcrf.gov). 

Protein extraction and western blot analysis. To extract proteins from squa- 
mous tumours, samples were snap frozen in liquid nitrogen, transferred to lysis 
buffer (1% NP-40, 200 mM NaCl, 25 mM Tris-HCl, pH 8, 1 mM DTT) including 
protease inhibitor cocktail (Roche) and homogenized and cleared by centrifu- 
gation at 13,000 r.p.m. To extract proteins from cells in culture, the same lysis 


buffer was added to the plate and scratched the cells from the plate surface, left 
lysing for 20 min in ice and cleared by centrifugation. Total protein quantifica- 
tion was performed using BCA Protein Assay (Thermo fisher). Equal amounts 
of protein were run in polyacrylamide gels. Western blotting was performed as 
described previously". The following primary antibodies were used for western 
blot analyses: anti-PSAT (Protein Tech Group, 10501-1-AP), anti-THBS1 
(Santa Cruz, sc-65612), anti-SESN2 (Protein Tech Group, 10795-1-AP), anti- 
calreticulin (Abcam, ab2907), anti-INHBA (Sigma-Aldrich, SAB1408593), anti- 
NSUN2 (Aviva Systems Biology, ARP48811_P050), anti-K19 (Abcam, ab52625), 
anti-CD44 (IM7, Biolegend, 103004), anti-BCL10 (H197, Santa Cruz, sc-5611), 
anti-semaphorin 3A (SEMA3A) (Abcam, ab23393), anti-PSMD11 (Abcam, 
ab66346), anti-SPHK1 (Cell Signaling, 3297), anti-APTX (Abcam, ab31841), 
anti-Slug (Cell Signaling, 9585P), anti-Snail (Abcam, ab180714), anti-SOD2 
(Abcam, ab13534), anti-CLSPN (Bethyl Laboratories, A300-266A), anti-ZAK 
(Sigma, HPA017205), anti-CHAFI1B or CAF1 p60 (Abcam, ab180371). Polyclonal 
a-tubulin (Abcam, ab15246) was used as a loading control. Band intensity was 
quantified with Image J software. 

Statistical methods. Group data are always represented by mean and s.d., unless 
otherwise indicated in figure legends. To test statistical significance between 
samples, unpaired two-tailed Student's t-tests were used. To test for significance 
of populations (that is, stem cells versus differentiated cell populations) within 
one sample (mouse) we used the paired Student's t-test. To analyse the differences 
among group means we used analysis of variance (ANOVA). Violin plots were 
created using the vioplot package (https://cran.r-project.org/package=vioplot) 
in R. The outline of the violin plots represents the kernel probability density of 
the data at different values. Violin plots include a marker for the median of the 
data and a box indicating the interquartile range®. Boxplots were created with 
Prism 6 software. The box extends from the 25th to 75th percentiles and the line 
in the middle of the box is plotted at the median. The whiskers show minimum to 
maximum. Scatter plots, linear regression lines and coefficient of correlation (r’) 
were calculated using Prism 6 software by computing non-parametric Spearman 
correlation and two-tailed P values. 

Sample sizing and collection. No statistical methods were used to predetermine 
sample size, but at at least three samples were used per experimental group and 
condition. The number of samples used in each experiment is indicated in Figs 1-6, 
in the legends of Extended Data Figs 1-10 and in Source Data files. 

Samples and experimental animals were randomly assigned to experimental 
groups. Sample collection was also assigned randomly. Experimental procedures 
in vitro, sample collection and data analysis were performed blindly whenever 
possible. Whenever possible automated quantifications were performed using the 
appropriate software. Most animal procedures (that is, mouse treatments) were 
performed blindly by individuals unaware of the experimental design. 
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Extended Data Figure 1 | Protein synthesis in epidermal populations. 
a, Hair cycle stages and genetic lineage marking using K19- and LGR5 
tdTom mice. Cell surface markers to isolated bulge stem cells are CD34 
and ITGA6. Telogen: stem cells (CD34*/ITGA6*) are quiescent and 
resting in the bulge (BG). Early anagen: stem cells divide and give rise 

to committed progenitors in the hair germ (HG), which then grow 
downwards into the bulb (BU) surrounding the dermal papilla (DP). Late 
anagen: cells differentiate upwards to form the hair. Catagen: intermediate 
phase, when the hair bulb degenerates into a new resting bulge. IFE, 
interfollicular epidermis; SG, sebaceous glands. Mouse transgenes label 
K19- (red) and LGRS5- (orange) positive stem cells and their progeny. 

b, OP-puro detection in mouse epidermis at all hair cycle stages. Dotted 


P-cadherin OP-puro 


lines indicate hair follicle and epidermal basal layer. Arrows indicate 
OP-purobis" cells in the hair follicle. Arrowheads indicate OP-puro! cells 
in the interfollicular epidermis. Nuclei are stained with DAPI. c, d, tdTom 
and OP-puro detection in back skin of K19tdTom and Lgr5tdTom mice 

in telogen and late anagen. Arrows indicate tdTom* cells. Arrowheads 
indicate Tomato*/OP-puro"# cells. Dotted line indicates lower bulge. 
Merged panels from c, d are shown in Fig. 1c-f. e, Hair follicle lineages 
and differentiation markers used in Fig. 1g-j. Ci, cuticle of inner root 
sheet; Ch, cuticle; Co, cortex; Cp, companion layer; He, Henle’s layer; 

Hu, Huxley layer; IRS, inner root sheet; Me, medulla; ORS, outer root 
sheet. f, P-cadherin and OP-puro detection in a late anagen. Scale bars, 
501m. 


© 2016 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


Number of cells 
= 


107 10° 10% 10° 10° 
OP-puro 
C,_ All 50%. 25% 10% 2. 2% dG... All , 50% 25% 10% 2.5% 
H : so 8 6 6 yea 

60 H : 

8 20 

Ss H 

= H 

> H 

a i 

< i 

2 i 

2 

a H 

o ; 

2 

3 . | 

- le) we | Late Anagen 
ial ar tr fr 
40 | i e 


roe KS e985 8 soe e soe 
KOK? KAAK RN SS 


© GO © 
KO 38 
EELS EGS. CS 


of 
SOE ESS feu Hoh Ss 5S LRM oe 


6S 
e or e. oo 3858 83 
GEER CES GES GEE 
op 


PPO SEH Ss Se oy hss RS Se 


e S588 sete” 


e€ 42 Catagen Telogen Anagen Anagen f 42 Catagen Telogen Anagen Anagen 


earl late earl late 
€ 10 |K19 tdTom- y 10 |LGR5 tdTom- °7"” 

= ‘ OP- puro? 5%high 5 OP-puro2S%*high ' 5 

& 4 6 5 « 

oO a 
B4in344 4449990445 , m55 5555 5 2° a 

g) w~@Govr~.9O4 qaG@wooqe®744 
i T T T T T T T T T T 1 i} T T iN x r E Z T T T 1 

S GS SG OS SSS SSS S © $ O 
EFS II OT FeO y £ Sood Puech ahah o® 

RRS EI ELIE 


op % op 4 op 
SS & ues SS SES vs PS 


Extended Data Figure 2 | Quantification of protein synthesis in 
epidermal populations. a, b, Top 2.5%, 10%, 25% and 50% translating 
epidermal cells (OP-purobis) (a) were sorted for CD34 and ITGA6 (b). 
c, Protein synthesis in CD34*+/ITGA6*, CD34-/ITGA6* and CD347/ 
ITGA6” epidermal populations in the top 2.5%, 10%, 25%, 50% or 100% 
(all) translating cells at indicated hair follicle stages. d, Percentage of 
CD34*/ITGA6*, CD34-/ITGA6* and CD34 _/ITGA6~ cells in the 
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top 2.5%, 10%, 25%, 50% or 100% (all) of translating epidermal cells at 
indicated stages of the hair cycle. Error bars show mean + s.d. 

e, f, Violin plots of protein synthesis in top 2.5% OP-puro"8" cells in 
tdTom™ epidermal cells sorted for CD34 and ITGA6 from K19-tdTom (e) 
or Lgr5-tdTom mice (f) at all stages of the hair cycle. (n = mice). Source 
Data for this figure is available in the online version of the paper. 
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Extended Data Figure 3 | Protein synthesis and cell cycle analyses 

in epidermal cells. a—c, Violin plots of protein synthesis in indicated 
epidermal populations sorted for K19-tdTom-positive (a) and LGRS5- 
tdTom-positive (b) and -negative (c) populations. Protein synthesis is 
shown for top 10%, 25% or 50% OP-purohish cells. d, e, Cell cycle analysis 
(d) and percentage of cells in G1/GO or S/G2/M in the top 2.5%, 10%, 25% 
or 50% OP-puro"" cells in late anagen (e). Data represent mean +s.d. 

f, Scatter plots correlating protein synthesis in the 2.5% OP-puro™8" 
population with percentage of cells in S/G2/M (top) and G1/GO 

(bottom) using all samples independent of hair cycle stage. Linear 


regression, correlation coefficient (r’) and P value are shown. g, Box plots 
of protein synthesis (top) and number of cycling cells (bottom) in the top 
2.5% translating cell populations (OP-puro"®"). h, Box plots of protein 
synthesis in cycling (S/G2/M) and non-dividing (G1/G0) cells in the 2.5% 
OP-puro"8" population isolated from Lgr5-tdTom mice. Shown are all cells 
(top), tdTom~ (Tom) (middle) and tdTom* (Tom*) (bottom) cells at the 
indicated hair cycle stages. **P < 0.01, ***P < 0.001, ****P < 0.0001, 
two-tailed Student's t-test. n = mice. Source Data for this figure is available 
in the online version of the paper. 
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Extended Data Figure 4 | Protein synthesis in squamous tumours. 
a-c, Co-labelling of OP-puro with markers for undifferentiated basal 
cells: ITGA6 (a), CD44 (b) and PDPN (c) in mouse tumours. Nuclei are 
stained with DAPI. Arrows indicate low translating and marker-positive 
cells. Dotted line indicates invasive front of the tumour. Boxed areas are 
magnified on the right. d, Gating of low, medium and high OP-puro cells 
in Nsun2*/* (wild type) and Nsun2~'~; K5-Sos skin tumours analysed 

in e-g. e, Percentage of OP-puro! cells in tumours from Nsun2*/* 

(wild type) and Nsun2~'~; K5-Sos mice. f, g, Flow cytometry for ITGA6 
and CD34 in unfractionated epithelial cells from mouse tumours (all cells) 
or epithelial cells with high, medium and low OP-puro incorporation (f) 
and quantification (g) (mean + s.d.;n=3 mice). h, Flow cytometry for 
ITGAG6 and CD44 in unfractionated epithelial cells from mouse tumours. 
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i, j, Histogram (i) and quantification (j) showing OP-puro incorporation 
of cells as gated and quantified in h (mean + s.d.; 1 =4 mice). 

k, 1, Detection of endogenous expression of NSUN2 (LacZ) in early (P23) 
k) and late (P30) anagen (1) hair follicles. Sections were co-stained with 
eosin or markers for bulge stem cells K15 and the hair lineages Huxley’s 
Hu), cuticle (Ci) (GATA3), and cortex (Co) (LEF1). m-o, Haematoxylin 
and eosin staining (m) and immunostaining for LEF1 (n), K72 and DLX3 
0) in wild-type (WT) and Nsun2~‘~ skin at P1. Nuclei are stained with 
DAPI. Insets: magnified boxed area (1, 2). Scale bars, 50m. p, Correlation 
between proliferation and protein synthesis with differentiation of 
quiescent (QSC) or committed stem cells (CSC), committed progenitors 

CP), differentiating progenitors (DP), and terminally differentiated (TD) 
cells. Source Data for this figure is available in the online version of the paper. 
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Extended Data Figure 5 | NSUN2 in mouse skin squamous cell 
carcinomas. a, Immunostaining for NSUN2, ITGA6, K10 (differentiation 
marker), laminin 5a and K8 (tumour progression markers), and Slug 
(epithelial to mesenchymal transition-related gene) at different stages of 
DMBA-TPA-induced malignant progression to squamous cell carcinoma 
(SCC). b-d, Quantification of tumour diameter normalized to body 
weight (BW) (b), tumours per mouse (c), and mouse life span (d) in 
K5-Sos/Nsun2*’* (K5-Sos), K5-Sos/Nsun2*~ and K5-Sos/Nsun2~'~ 
littermates. Measurements start at P16. Data collection discontinued 
when mice died (indicated by a dagger). Data represent mean, n > 5 

mice of each genotype. e, f, Haematoxylin and eosin staining (e) and 
immunostaining for ITGB1 (f) in sections from K5-Sos (K5-Sos/Nsun2*’*) 
and K5-Sos/Nsun2~'~ skin tumours. b, basal undifferentiated cells; 


sb, suprabasal layers. Arrows indicate ITGBI* cells. g, Relative mRNA 
expression levels of the indicated transcripts in skin tumours (mean + s.d.; 
n=mice). h, Flow cytometry using ITGA6 and CD44 in K5-Sos/Nsun2~/~ 
and control K5-Sos (K5-Sos/Nsun2*/*) tumours. i, Percentage of cells 

in cell populations as gated in h (mean +s.d.; n= mice). *P < 0.05; 

*** P< 0.001 (two-tailed Student's t-test) (i). j, TdT-mediated dUTP 

nick end labelling (assay) (TUNEL) assay on sections of K5-Sos tumours 
expressing (K5-Sos/Nsun2*/*) or lacking Nsun2 (K5-Sos/Nsun2~'~). 
Arrows indicate TUNEL* (apoptotic) cells. Nuclei are stained with DAPI. 
Dotted line indciates boundary of epithelia and stroma (f, j). Scale bars: 
25 um (a), 100 1m (e, f, j). Source Data for this figure is available in the 
online version of the paper. 
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Extended Data Figure 6 | Deletion of Nsun2 enhances self-renewal expression. Nuclei are stained with DAPI. g-l, Immunohistochemistry 
of tumour-initiating cells in a cell-autonomous manner and NSUN2 for NSUN2 in human normal skin, benign tumours, malignant basal cell 
expression in human skin tumours. a, Tumour size after grafting of K5-Sos/_ carcinomas (BCC) and squamous cell carcinomas (SCC) with increased 
Nsun2*/* (K5-Sos) and K5-Sos/Nsun2~'~ tumour cells subcutaneously malignancy (stages classified using the TNM system). Arrows indicate 
into nude mice (mean + s.d.; n =3 mice). b-f, Histology (haematoxylin NSUNZ2bis» cells. Arrowheads indicate NSUN2!” cells. m, Distribution of 
and eosin staining) (b), staining for GFP (c), Ki67 (d), ITGB1 (e) and cells shown in g-l according to NSUN2 protein levels. (n > 3 samples). 
PDPN (f) in grafted tumour sections. Dotted line indicates boundary Scale bars, 100j1m. Source Data for this figure is available in the online 
between epithelia and stroma. Arrows indicate basal and suprabasal version of the paper. 
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Extended Data Figure 7 | NSUN2-dependent RNA methylation 

of coding and non-coding RNA in mouse tumours. a, Percentage 

of NSUN2-methylated sites (>0.15 m°C in Nsun2*’*; <0.05 m°C in 
Nsun2~‘~) out of all covered sites (left) and in non-coding RNA (ncRNA) 
or introns and exons (right). b, Methylation level in coding and non- 
coding RNAs (>0.15 m°C in Nsun2*’*; <0.05 m°C in Nsun2~'~; coverage 
>10 reads). c-e, Examples of NSUN2-targeted non-coding RNA (Rpph1) 
and mRNA (Elf1 and Dscaml1) in Nsun2*’* (top) and Nsun2~/~ (bottom) 
tumours. f, Number of NSun2-methylated sites in exons 1 to 60 (top) or 
distance from the transcriptional start site (TSS) in introns (bottom). 
Plotted sites: >0.1 m°C in Nsun2*/*; <0.05 m°C in Nsun2~/~; coverage 
>10 reads. g, No correlation between NSUN2 methylation shown in b 
and RNA abundance in normal skin or K5-Sos skin tumours. NSun2 is 


highlighted as a control. h, Venn diagram with no significant overlap 
between NSUN2-methylation targets shown in b and differentially 
translated mRNAs (P < 0.05; measured as ribosome density of Nsun2*/+ 
versus Nsun2~'~ tumours). i-l, NSUN2 methylation in tRNAs (>0.15 
m°C in Nsun2*/*; <0.05 m°C in Nsun2~'~; coverage >10 reads) (i). 
Number and location of lost (red) or unchanged (grey) m°C sites in 
K5-Sos/Nsun2~'~ tumours. Nucleotide position in tRNA is shown on the 
x-axis (j). Examples of NSUN2-targeted tRNAs in Nsun2*/* (top) and 
Nsun2~'~ (bottom) K5-Sos tumours (k, 1). Heatmaps show methylated 
(red) and unmethylated (grey) cytosines. Cytosines are shown on the 
x-axis, and sequence reads on the y-axis. Numbers indicate the mC 
position in the RNA (c-e; k, 1). Bisulfite-seq and RNA-seq data represent 
average of 4 replicates per condition. 
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Extended Data Figure 8 | Nsun2 deletion drives translational changes 
independent of mRNA expression. a, Ribosome profiling and RNA- 
sequencing experiments (see Fig. 5) using Nsun2-expressing (Nsun2*/*) 
and Nsun2-deficient (Nsun2~'~) K5-Sos skin tumours, or cultured human 
skin fibroblasts (NSUN2~/~""*! and NSUN2~/~""*? and healthy donors: 
NSUN2*/~, NSUN2*/*). HTS, high-throughput sequencing. b, Correlation 
between protein synthesis (ribosome footprint density) in Nsun2*/* and 
Nsun2~'~ tumours. c, Example of triplet periodicity in ribosome footprints 
(K5-Sos/Nsun2*’*, replicate 1) shown as number of reads against 
nucleotide position relative to the translational start site for all ORFs. 

d, Heatmaps showing ribosome footprint reads around the translational 
start site (0) in Nsun2*/* and Nsun2~/~ tumours (3 replicates per 
condition; ribosome density >0; colour indicates RPKM values of 
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footprints). e, Log, fold change (FC) per transcript in normal skin 

(left) and tumour samples (right) of significant (P < 0.05) expression 
differences. Nsun2 RNA levels (red). f-j, Scatter plots, linear regression 
lines and coefficient of correlation (r*) of mRNA expression and protein 
synthesis (density of ribosome footprints per kb) in Nsun2*/* (grey) and 
Nsun2~'~ (red) mouse tumours (f) and human fibroblasts (g-j). k, Venn 
diagram of transcripts with significant (P < 0.05) different ribosome 
footprint density in the 5/ UTR in NSUN2*/~, NSUN2~/~!i"®! and 
NSUN2~'~"'"*? human fibroblasts relative to NSUN2** cells. 1, Box plots 
of ribosome footprint read counts in the 5’ UTR (left) and corresponding 
CDS (right) of the 192 transcripts in k. ****P < 0.0001 (two-tailed 
Student’s t-test). 
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Extended Data Figure 9 | RNA methylation-dependent changes of 
protein synthesis. a, Venn diagram of transcripts with differential protein 
synthesis in NSUN2*/~ and NSUN2~'~ human fibroblasts relative to 
NSUN2*“* cells. b, GO terms enriched in 424 commonly differentially 
translated transcripts in NSUN2~‘~ lines (a). c, Western blot for NSUN2 
and tubulin in NSUN2~'~ human fibroblasts rescued with viral constructs 
expressing wild-type NSUN2 (NSUN2-wt), two catalytically dead mutants 
(C271A and C321A) or the empty vector. d, Venn diagram of differentially 
translated transcripts in the indicated rescued cells relative to empty 
vector-infected control cells. Translation of 173 out of 746 of transcripts 
(23%) depended on the enzymatic activity of NSUN2. e-g, Differential 
translation of transcripts relative to NSUN2~‘~ cells (infected with empty 
vector) showing reduced translation in the presence of wild-type 

NSUN2 but not the enzymatic-dead versions of NSUN2 (C2714, 

C321A), corresponding GO categories (f) and examples (g). h, Boyden 
chamber migration assay towards epidermal growth factor (EGF) or 
control medium (ctr) using primary human keratinocytes transduced 
with a siRNA for NSUN2 (si_NSUN2) or a scrambled construct (si_ctr). 


Data represent mean + s.d. (n = 3 assays). Western blot confirms 
downregulation of NSUN2 in the presence of the siRNA construct. 

i, j Reduced motility in keratinocytes expressing the enzymatic-dead 
NSUN2 construct (K190M) (K190M: n= 13; NSUN2: n= 19 cells) (i). 
Western blot confirms equal protein expression levels of K190M and 
NSUN2 (j). k, Reduced differentiation in primary human keratinocytes 
expressing the enzymatic-dead NSUN2 (K190M). Staining for NSUN2, 
ITGA6 or involucrin (IVL) and nuclei (DAPI). Control: empty vector 
(left); NSUN2: wild-type NSUN2 (middle); K190: enzymatic-dead 
NSUN2 (right). Arrows indicate NSUN2-expressing ITGA6~/IVL* cells. 
Arrowheads indicate K190M-expressing ITGA6*/IVL* cells. 1, Flow 
cytometry for ITGA6 of keratinocytes transduced with NSUN2 (blue line, 
top panel), K190M (blue line, bottom panel) or the empty vector (eVector) 
(red line). Negative control (grey line) represents unstained cells. 

m, Quantification of IVL* infected keratinocytes grown in suspension 
for 24h to stimulate differentiation. *P < 0.05, **P< 0.01 (two-tailed 
Student's t-test) (h-m). Scale bar, 100 1m. Source Data for this figure is 
available in the online version of the paper. 
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Extended Data Figure 10 | Protein expression differences, drug 
treatment of Nsun2~'~ tumours and graphical summary. 

a, b, Western blot analysis of translationally repressed (a) or induced (b) 
mRNAs in Nsun2~/~ (~'~) compared to Nsun2*/* (WT) skin tumours 
with quantification of band densitometry on the right (mean +s.d.; 

n=3 mice). *P< 0.05, ***P < 0.001 (two-tailed Student’s t-test). 

c, d, Control and 5FU-treated tumours, before and after treatment. 

e, f, Immunohistochemistry for p53 in tumours shown in ¢, d. g-i, 
Immunostaining for cleaved caspase 3 (CI-CASP3) (g), Ki67 (h), ITGA6 
and K10 (i) in K5-Sos tumours expressing (+/+) or lacking (—/—) Nsun2 
and treated with CDDP (see Methods). Scale bars, 100 1m. j, Graphical 


summary: (1) quiescent undifferentiated stem and progenitor cells are 
characterized by the absence of NSUN2 and low global protein synthesis; 
(2) upregulation of NSUN2 counteracts angiogenin-mediated cleavage 

of tRNAs through site-specific methylation of tRNAs, allowing increased 
translation of lineage-specific transcripts driving terminal differentiation; 
(3) cytotoxic stress inhibits NSUN2 and global protein synthesis in 
particular of lineage-specific transcripts and promotes an undifferentiated 
quiescent cell state. Yet cell survival after the insult requires re-methylation 
of tRNAs by NSUN2 (see (2)); (4) the inability to upregulate NSUN2 in 
response to the cytotoxic insult leads to cell death. Source Data for this 
figure is available in the online version of the paper. 
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Dual targeting of p53 and c- MYC 
selectively eliminates leukaemic stem cells 


Sheela A. Abraham!*, Lisa E. M. Hopcroft!*, Emma Carrick’, Mark E. Drotar!, Karen Dunn!, Andrew J. K. Williamson?*, 
Koorosh Korfi!*+, Pablo Baquero’, Laura E. Park!, Mary T. Scott!, Francesca Pellicano!, Andrew Pierce**, Mhairi Copland!, 
Craig Nourse*, Sean M. Grimmond?, David Vetrie*, Anthony D. Whetton?*°* & Tessa L. Holyoake!* 


Chronic myeloid leukaemia (CML) arises after transformation of a haemopoietic stem cell (HSC) by the protein-tyrosine 
kinase BCR-ABL. Direct inhibition of BCR-ABL kinase has revolutionized disease management, but fails to eradicate 
leukaemic stem cells (LSCs), which maintain CML. LSCs are independent of BCR-ABL for survival, providing a rationale 
for identifying and targeting kinase-independent pathways. Here we show—using proteomics, transcriptomics and 
network analyses—that in human LSCs, aberrantly expressed proteins, in both imatinib-responder and non-responder 
patients, are modulated in concert with p53 (also known as TP53) and c-MYC regulation. Perturbation of both p53 and 
c-MYC, and not BCR-ABL itself, leads to synergistic cell kill, differentiation, and near elimination of transplantable 
human LSCs in mice, while sparing normal HSCs. This unbiased systems approach targeting connected nodes exemplifies 
a novel precision medicine strategy providing evidence that LSCs can be eradicated. 


BCR-ABL1 is a chimaeric oncogene arising from the t(9;22)(q34;q11) 
chromosomal translocation. The resultant protein tyrosine kinase 
(PTK) drives signalling events! and transforms HSCs. BCR-ABL activ- 
ity in HSCs causes CML, which if untreated, is fatal. 

Tyrosine kinase inhibitors (TKIs), such as imatinib mesylate, are 
standard CML treatment and have improved survival, justifying the 
use of single-target therapies”. However, these drugs do not kill the 
LSCs that maintain the disease’, resulting in ever-increasing costs to 
sustain remissions. TKI discontinuation in the best 10-20% of TKI 
responders led to relapse rates of 50-60%, reinforcing the need to 
understand and target CML LSCs* with curative therapies. Recent 
studies suggest that LSC survival is BCR-ABL-kinase independent® 
and BCR-ABL has functionality beyond PTK activity, explaining the 
shortcomings of TKIs°. 

We have applied systems biology approaches to patient material to 
identify key protein networks that perpetuate the CML phenotype, 
aiming to elucidate potentially curative therapeutic options. Using 
unbiased transcriptomic and proteomic analyses, the transcription 
factors p53 and c-MYC are identified as having defining roles in CML 
LSC survival. We demonstrate an integral relationship between p53 
and c-MYC in the maintenance of CML, and importantly, a potential 
therapeutic advantage they have as drug targets over BCR-ABL for 
eradication of CML LSCs. 


p53 and c-MYC mediate the CML network 

To interrogate perturbations in BCR-ABL signalling of potential thera- 
peutic value, isobaric-tag mass spectrometry (MS) was used to compare 
treatment-naive CML and normal CD34° cells. Fifty-eight proteins 
were consistently deregulated in three CML samples (Methods and 
Supplementary Table 1). Dijkstra’s algorithm’ and the MetaCore know1- 
edge base (https://portal.genego.com/) were used to identify p53 and 
c-MYC as central hubs (Supplementary Table 2) in a CML network of 
30 proteins (Fig. 1a) predominantly downstream of the transcription 


factors, with significant enrichment for p53/c-MYC targets (Fisher’s 
exact test, P=0.001). While the majority of proteins downstream of 
p53 were downregulated, those downstream of c-MYC included pro- 
teins up- or downregulated in CML, in keeping with c-MYC acting 
as an activator and repressor of gene transcription®. The deregulated 
network suggests an altered dependency on p53 and c-MYC in CML 
CD34* cells. 

This data set represents the first—to our knowledge—relative 
quantitative comparison of CML to normal CD34? cells 
using MS. Importantly, CML-initiating cells reside within the 
CD34*+CD38~Lin~ subpopulation and may differ to bulk CD34* 
cells. To substantiate the CML proteome observations and investigate 
regulation in LSCs, we examined relevant, primary CML transcrip- 
tomic data. Network protein levels correlated well with respective 
gene levels, in both LSCs (four independent data sets; Fig. 1b and 
Extended Data Fig. la-c) and CD34" progenitors (Extended Data 
Fig. 1d, e). Correlations were stronger for the 30 network candidates 
compared to all 58 deregulated proteins; seven data sets showed 
significant gain in r? for network candidates (Extended Data 
Fig. 1a, d). The mutual information (MI) of proteomic/transcriptomic 
data for network proteins was significantly greater than random 
(Fig. 1c and Extended Data Fig. 1b, e). This consistent messenger 
RNA/protein correspondence, in both progenitors and LSCs, 
confirmed that the network was transcriptionally regulated, compatible 
with c-MYC and p53 function. 

p53 and c-MYC have key roles in oncogenesis and appear in many 
cancer networks. To distinguish true regulatory effectors, we assessed 
the bias towards outgoing versus incoming signalling (degreeout/ 
degreein (dour/din)) for p53 and c-MYC. We generated networks from 
deregulated proteins in (1) primary MS data sets”, (2) cell lines trans- 
duced with oncogenic PTKs driving haematological malignancies”, 
and from (3) 50 randomly generated protein sets. Our network falls 
outside the expected random distribution and no other data set exhibits 
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Figure 1 | p53 and c-MYC network in CML regulation. a, Network 
analysis reveals that c- MYC and p53 are central in a putative CML network 
(n =3 patient samples, n =2 normal samples). b, Correlation between 
proteomic/transcriptomic deregulation in primitive CD34*Hst!°Py'® (GO) 
(top two panels); CD34*CD38~ (second panel from the bottom) and 

Lin” CD34*CD38 CD90* CML cells (bottom panel). Filled black circles 
indicate all protein/genes; filled red circles indicate network proteins/ 


greater downstream bias for p53 and c-MYC (Fig. 1d). These data 
support a novel network in, and unique to, CML, centred on p53 and 
c-MYC. 


Validation of network candidates 

The CML network revealed well-characterized p53/c-MYC targets 
and proteins not previously associated with CML pathogenesis 
(Supplementary Table 3). To validate proteomic predictions 
(Fig. 1a), gelsolin, CIP2A (also known as KIAA1524), UCHLI, aldose 
reductase, p53 and c-MYC were assessed using western blotting and 
immunofluorescence (Fig. 2a, b). Protein expression of gelsolin, 
CIP2A, UCHLI and aldose reductase were consistent with CML 
network predictions (Fig. 2a). Immunofluorescence was confirmatory, 
highlighting the dramatic difference in CIP2A expression between 
normal and CML cells, and the intracellular localization of gelsolin 
and aldose reductase (Fig. 2b). CML cells also expressed increased 
c-MYC and decreased p53 levels (Fig. 2a, b), correlating well with 
appropriate modulation of downstream targets. We therefore hypoth- 
esized that simultaneous p53 activation and c-MYC inhibition would 
kill LSCs. To assess dual hub requirement for CML survival, lentiviral 
short hairpin RNA (shRNA) constructs (Extended Data Fig. 2a) were 
employed. Knockdown of HDM2 (also known as MDM2; £3 ligase/ 
negative regulator for p53), c-MYC, or both, in CML CD34* cells 
reduced viability and enhanced apoptosis; the combined effects were 
synergistic. In colony-forming cell (CFC) assays, effects were more 
dramatic with single or combined knockdown, strengthening the 
hypothesis that p53 and c-MYC are critical for the survival of CML 
cells (Fig. 2c-e and Extended Data Fig. 2b, c). We then investigated 
synergistic interactions between p53 and c-MYC, testing clinically 
tractable inhibitors. 


RITA and CPI-203 synergize to drive CML CD34* cell kill 

To target identified hubs, we selected RITA (also known as 
NSC652287), which binds p53 and blocks its degradation, and CPI- 
203, a bromodomain and extra terminal protein (BET) inhibitor 
hindering transcription by disrupting chromatin-dependent 
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signal transduction!*". As anticipated!®, c-MYC was downregu- 
lated 8h after CPI-203 treatment (Extended Data Fig. 3a). CPI-203 
also reduced p53 at 8h. RITA subtly increased p53 by 8h, further 
enhanced by 24-48h. Dasatinib (Das) at 150 nM (a concentra- 
tion achievable in patients to fully inhibit BCR-ABL!®) gradually 
reduced p53 levels and inhibited phosphorylation of STATS as 
previously observed!”'®. RITA with CPI-203 for 8h reduced both 
p53 and c-MYC, suggesting a dominant effect of CPI-203 at this 
early time point, but by 48h markedly increased p53 (Extended Data 
Fig. 3a-c). 

RITA or CPI-203 treatment of CML CD34" cells for 72h reduced 
viability in a concentration-dependent manner and induced significant 
apoptosis; combining drugs resulted in further significant changes in 
these parameters (Fig. 3a, b and Extended Data Fig. 3d). Labelling with 
the cell division tracker carboxyfluorescein succinimidyl ester (CFSE) 
and CD34 antibody showed that as CML cells divided in the pres- 
ence of CPI-203, there was clear and rapid loss of CD34 expression 
not seen with RITA (Fig. 3c), suggesting that c-MYC (a predominant 
target of CPI-203; ref. 15) inhibition induces differentiation of CML 
CD34* cells. Differentiation was further suggested by skewing of the 
morphology, size and number of CFCs (Fig. 3d). RITA decreased CFCs 
but did not affect colony types (Fig. 3d, e). By inducing apoptosis and 
differentiation, the drugs may synergize and enhance elimination of 
CML. By measuring drug-dose response’’, combination therapy was 
potently synergistic with combination indices (CIs) ranging from 0.07 
to 0.34 (Fig. 3a). Nutlin-3a (Nut), another HDM2 inhibitor, produced 
similar results. The effects of RITA and Nut were p53 dependent, as 
K562 cells lacking p53 were non-responsive (Extended Data Fig. 3e 
and 4a, b). Since CPI-203 + RITA reduced p53 at the early time point, 
sequential inhibition of HDM2 and c-MYC was tested using chemical 
and genetic approaches. Neither inhibition of HDM2 before c-MYC nor 
vice versa improved cell kill compared with simultaneous knockdown 
or drug inhibition, or compared with nilotinib (Nil) (Extended Data 
Fig. 4c, d). 

CML patients receive a TKI, irrespective of response. We therefore 
assessed RITA and CPI-203 effects in imatinib-mesylate-pre-treated 
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Figure 2 | Validation of proteomic network. a, Network proteins, p53 
and c-MYC western blots using CML and normal CD34" cells. For gel 
source data, see Supplementary Fig. 1. b, Network proteins validated by 
immunofluorescence in CML and normal CD34* cells labelled green 

or red (far left), with the nucleus stained blue using 4’,6-diamidino-2- 
phenylindole (DAPI; second left), overlays of images (second right), and 


CML CD34* cells. Imatinib mesylate neither ameliorated nor enhanced 
the efficacy of RITA and/or CPI-203 (Extended Data Fig. 4e). 


RITA and CPI-203 eliminate LSCs 
The CML network suggested an altered dependency on p53/c-MYC 
signalling. We therefore hypothesized that normal cells may be less 
susceptible to the drug combination. Treatment of normal CD34* cells 
with single agents or combinations had no significant effects on cell 
counts. However, increased apoptosis was observed at higher CPI-203 
concentrations (2 or 51M) and with the highest combination (RITA 
25nM, CPI-203 541M; Extended Data Fig. 5a, b). In CML cells, signif- 
icant apoptosis was observed with all four CPI-203 and combination 
concentrations (Fig. 3b), confirming a therapeutic window. 

To confirm the in silico results that led to the prediction (Fig. 1b, c 
and Extended Data Fig. 1c) that an altered dependency on p53 and 
c-MYC extended to primitive LSCs, we exposed CML LSCs to the drug 
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combination. LSCs were defined as either CFSE™™* or CD34* CD38. 
As shown previously'®, in comparison to untreated control, over 5 days 
the CFSE™* population persisted in response to Das and Nil, but was 
significantly reduced by CPI-203 alone and by combination treatment 
(Figs 3c and 4a, b). Over 72h, RITA with CPI-203 was also effective in 
synergistically eliminating residual CD34*CD38~ cells (CI=0.3-0.8; 
Extended Data Fig. 5c). 

HSCs and LSCs are most stringently defined by their engraftment 
capacity at 16 weeks. We exposed CML CD34" cells to RITA, CPI- 
203, the combination, or Das for 48h before transplantation into sub- 
lethally irradiated NSG mice (Extended Data Fig. 5d). Human CD45~ 
cells were detectable in peripheral blood at 8, 12 and 16 weeks and in 
bone marrow at 16 weeks post-transplantation. Das had no significant 
effect on NSG-repopulating CML LSCs, representing the most prim- 
itive long-term engrafting cells. In contrast RITA, CPI-203, and the 
combination reduced engraftment as indicated by decreased CD45", 
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Figure 4 | p53 and c-MYC abrogation in normal and primitive CML 
cells. a, CFSE/CD34-labelled CML cells. Combo, combination. 

b, Recovery of CFSE™* CML cells after 5 days treatment (n = 3 patient 
samples). c, Bone marrow (BM) analyses of human CML, replicated twice 


CD34*, CD33*, CD11b*t, CD19* and CD14? cells (Extended Data 
Fig. 5e, f). Using a CML sample known to engraft both BCR-ABL* and 
BCR-ABL cells by double-fusion fluorescence in situ hybridization 
(D-FISH), there was a marked decrease in the long-term-engrafting 
potential of RITA-, CPI-203- or combination-treated leukaemic cells, 
with no significant effect on non-leukaemic populations (Fig. 4c, d and 
Extended Data Fig. 5g, h). Experiments using cord blood CD34* cells 
confirmed the selectivity of RITA and CPI-203 for BCR-ABL* versus 
BCR-ABL-™ stem cells (Fig. 4d). 


Mechanism of LSC kill and clinical scope 

To understand the mechanism(s) underlying reduction of CML stem 
and progenitor cells in response to RITA and CPI-203, RNA sequencing 
(RNA-seq) was performed. Of the 12,248 genes sequenced, 2,134 were 
identified as synergistically modulated by combination treatment; 166 
demonstrated extreme synergy (Fig. 5a). Moreover, 81% of the genes 
differentially expressed in response to the combination were dereg- 
ulated in the same direction with RITA or CPI-203 (y7(1) =891.93, 
P<0.01). While transcriptional responses to RITA or CPI-203 were 
enriched for p53/apoptosis or c-MYC/differentiation (not found with 
Nil), respectively, the combination induced enhanced or additional 
enrichment of these molecular signatures and pathways (Fig. 5b, 
Extended Data Fig. 6a—c and Supplementary Tables 5-7). Furthermore, 
stem/progenitor markers CD34 and CD133 were dramatically down- 
regulated by CPI-203 and the combination, but not by RITA or Nil 
(Extended Data Fig. 7b). The enrichment in the p53/apoptosis and 
c-MYC/differentiation pathways paralleled the in vitro phenotypic 
effects observed. Limited overlap in gene membership of the signatures 
identified in silico demonstrates that distinct molecular components 
contribute to single and combined drug responses (Fig. 5c). 

CML stem cell persistence is an issue for all CML patients, how- 
ever, many also exhibit or acquire TKI resistance or demonstrate a 
more aggressive clinical phenotype’. These represent patients in 
whom novel agents targeting p53 and c-MYC would first be tested. 
To investigate whether the deregulated p53/c-MYC network is present 
in both TKI-responder (TKI-R) and TKI-non-responder (TKI-NR) 
patients?!, and in more advanced forms of CML”°, we considered 
data from CD34+ CML cells derived from suitable patient cohorts. 
Transcriptional expression of the network components was highly 
correlated across all CML versus normal comparisons, irrespective 
of TKI response or clinical phenotype (Fig. 5d). In keeping with these 
in silico data, CD34* cells from a TKI-NR patient showed high levels 
of apoptosis after treatment with RITA and/or CPI-203 (Extended 
Data Fig. 7c), suggesting that these drugs should be of therapeutic 
value for such patients. 


RG7112/7388 and CPI-203/0610 therapy 

To progress the drug combination towards the clinic, we used com- 
plementary preclinical mouse models and introduced RG7112 and 
RG7388, both HDM2 inhibitors”, and CPI-0610, a BET inhibitor??; 
drugs already advanced in clinical trials in humans. In the SCL-tTA- 
BCR-ABL1 double transgenic (DTG) leukaemia mouse model BCR- 
ABLI, driven from the stem cell promoter (SCL), is inducible in HSCs 
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by tetracycline withdrawal (tTA), resulting in a transplantable CML- 
like disease with increased myeloid counts and splenomegaly'®!74. 
After irradiation, C57BL/6 CD45.1* mice were used as recipients 
and CD45.2* mice as DTG bone marrow donors (Extended Data 
Fig. 8a). After transplantation (to synchronize leukaemia development 
and assess transplantable LSCs), CML was induced. In mice and/or 
rats, RG7112 at 50-200 mgkg~' and CPI-0610 at 15-60mgkg~' have 
demonstrated on-target effects in tumours”*”’. Excellent tolerability 
was achieved with modest doses of RG7112 (50mgkg~! once daily) and 
CPI-0610 (15mgkg! twice daily, both for 4 weeks), selected to demon- 
strate synergy. White blood cell and neutrophil counts returned to 
non-leukaemic control levels with the drug combination, but not with 
single treatments (Fig. 6a and Extended Data Fig. 8b). While CPI-0610, 
Nil, and the combination significantly reduced spleen size (Fig. 6b), 
only the combination significantly reduced donor leukaemic CD45.2* 
cells in the bone marrow (protected by the niche), while simultane- 
ously allowing recovery of host normal CD45.1* cells; CD45.1:CD45.2 
ratio changed from 20:80 (untreated) to 40:60 (combination) (Extended 
Fig. Data 8c). At the stem cell level, donor leukaemic Lin Sca-1* 
c-Kit* (CD45.2 LSK) cells were reduced by >60% by the combination, 
while host LSKs were unaffected. None of the single arms reduced LSKs 
(Fig. 6b-d), supporting the synergistic effects demonstrated in vitro. 
To confirm that these therapeutic in vivo effects extended to human 
CML, two cohorts of sublethally irradiated NSG mice were transplanted 
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Figure 5 | Mechanism and clinical relevance of treatment. a, Molecular 
synergy for 100nM RITA, 11M CPI-203 and RITA plus CPI-203 (Combo) 
24 h treatment (NDC, no drug control; n = 3 patient samples per arm); 
mean expression of ‘all’ and ‘extreme’ synergistic genes summarized as 
indicated. FC, fold change. b, Enrichment of p53 (far left), apoptosis 
(second left), c- MYC (second right) and differentiation (far right) 
Molecular Signatures Database (MSigDB) signatures. Asterisks 

indicate significant enrichment specific to combination treatment. 

c, Gene membership of three functional signatures. d, Comparison of 
transcriptional profiles of TKI-R, TKI-NR and baseline CML versus 
normal (left), and aggressive, indolent and baseline CML versus normal 
(right) for our candidate network (Fig. 1a). FDRs calculated by 

10,000 permutations. 
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Figure 6 | Targeting p53 and c-MYC in CML elicits synergistic kill 

in BCR-ABL* LSCs. a, White blood cell (WBC) counts and spleen 
weights normalized to control (dotted line) (experiments replicated 
twice, minimum n=7 mice per arm; vehicle, no drug control). Combo, 
combination treatment. b-d, Bone marrow stained for CD45.1/2 and 
further gated on Lin“ Sca-1*c-Kit* (LSK). Drug treatments (experiments 
replicated twice, minimum n=5 mice per arm). e, f, NSG mice 

in vivo treatment: bone marrow stained for human Philadelphia 
chromosome (Ph)*CD45* (left) and further gated on CD34* cells 
(right). g, Representative CD34* dotplots (experiments replicated twice 
(2 patient samples), minimum n= 9 mice per arm); mean +s.e.m. 

(P values: two-tailed Student’s t-test; *P < 0.05, **P< 0.01, ***P< 0.001). 


with independent CML CD34* samples and treated with RG7388 
and CPI-203 (75-100 mg kg~! and 6-7.5 mg kg~', respectively) for 
3-4 weeks. Of the single agents, only CPI-203 showed a consistent 
effect. The drug combination, however, eliminated 95% of Ph*+CD45* 
and 88% of CD45*CD34* subsets (Fig. 6e, f). These results were 
significant as compared to vehicle (P < 0.001) or Nil (P=0.0016 
(CD45*); P=0.0047 (CD45*CD34*)) and as compared to RG7388 
(P = 0.0017 (CD45*); P= 0.0004 (CD45*CD34")) or CPI-203 
(P= 0.0046 (CD45*); P= 0.0008 (CD45+CD34"*)), respectively 
(Fig. 6e, f), again suggesting a high degree of synergy. 


Discussion 
This work demonstrates the potential of unbiased, systems approaches 
to uncover new therapeutic options by analysing primitive stem 
cell subsets from primary material. We found that p53 and c-MYC 
work together with BCR-ABL to shape LSC phenotype and show 
that modulation of both p53 and c-MYC is critical to drive syner- 
gistic enhancement of apoptosis and differentiation seen in vitro, 
in vivo, at the stem cell level and the molecular level by RNA-seq. 

p53 and c-MYC have individually been identified as proteins in CML 
pathobiology!”*>° and cancer*!-°3, but have not previously been con- 
sidered for dual targeting. In recent CML studies, enhanced LSC kill 
converged on p53 as the mediator of apoptosis'””>**, CML LSCs are 
also susceptible to enhancement or depletion of c-MYC. After deletion 
of the E3 ligase FBXW7, c-MYC increases with p53, resulting in cell 
cycle entry and p53-dependent apoptosis”*"°. However, FBXW7 may 
not represent a viable drug target based on its role in haematopoiesis, 
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tissue stem cells and, importantly, as a tumour suppressor. Currently, 
there is interest in drugging the spliceosome machinery, particularly 
for MYC-driven cancers, however, a therapeutic window remains to 
be established and exploited with well-tolerated agents*>**. BET inhi- 
bition is a rapidly expanding market with multiple agents in phase 1-2 
development. These agents are well tolerated and demonstrate efficacy 
in haematological malignancies. Resistance to BET inhibitors evolves 
through epigenetic mechanisms*’, however, our combination approach 
will be less susceptible to resistance. 

Over the last decade CML has been transformed from a fatal 
cancer to a manageable disease with lifelong therapy. Despite recogni- 
tion that LSCs prevent cure, the paradigm established by TKIs means 
that novel drugs must be safe, supported by a clear therapeutic window 
and easy to administer. As a result, few preclinical studies have reached 
the clinic and trials fail owing to toxicity and poor recruitment****. 
CML is often regarded as a simple cancer, driven solely by BCR-ABL, 
yet we do not understand why targeting BCR-ABL does not eradicate 
LSCs, nor cure CML. Our work shows that BCR-ABL reprograms 
potent oncoproteins and tumour suppressors, to establish a signalling 
network that underlies the propagation of CML. Critically, we found 
that simultaneous perturbation of p53 and c-MYC, mechanistically 
driving modulation of p53, apoptosis, c-MYC and differentiation 
pathways, improved selective kill of LSCs as compared to TKIs. Nil 
was ineffective against these pathways, potentially explaining why TKIs 
are not sufficient to cure CML. The fact that the aberrant network was 
similarly regulated in TKI-Rs, TKI-NRs, and patients with aggres- 
sive and indolent CML, coupled with the availability of well-tolerated 
oral agents in an advanced stage of development, now offers an 
entirely novel approach for the treatment of CML, with the therapeutic 
potential to address CML LSC persistence and improve outcome for 
CML patients. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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Patient samples. Patient samples (PS) were leukapheresis products taken at time 
of diagnosis with chronic-phase CML, with written informed consent in accord- 
ance with the Declaration of Helsinki and approval of the Greater Glasgow and 
Clyde National Health Service Trust Institutional Review Board. CD34* cells were 
enriched using CliniMACS (Miltenyi Biotec), with stem cell subsets purified by 
FACS. CML CD34? samples were cultured in serum-free medium (SFM) supple- 
mented with growth factors as described previously'®. Normal CD34" cells were 
CD34-enriched leukapheresis products or cord blood maintained as described 
for CML CD34* samples. All PS and relevant clinical data are summarized in 
Supplementary Table 4. All in vitro work was performed with a minimum of 3 PS 
(3 biological replicates) unless otherwise indicated. Unless otherwise indicated 
each PS was analysed as an individual sample replicated once in an experiment. 
Cell lines. The HeLa cell line was obtained from the German Collection of 
Microorganisms and Cell Cultures (DSMZ; originally deposited by ATCC). 
HeLa cells were subcultured in RPMI 1640 (10% FCS plus 2mM t-glutamine, 
100 units ml penicillin and 100j:g ml! streptomycin (Gibco- Life Technologies)) 
(passage 3). The K562 cell line (DSMZ) was subcultured in IMDM (10% FCS 
plus 2mM t-glutamine, 100 units ml“! penicillin and 100 1g ml“! streptomycin 
(passage <6)); the cell lines were not authenticated between passage 2-6. Cell lines 
were mycoplasma negative in DAPI, microbiological culture, RNA hybridization 
and PCR assays. 

CML cytoplasmic preparations for MS. CML and normal CD34" cells were 
thawed and cultured overnight as described**. Cytoplasmic preparations were 
prepared using the Active Motif Nuclear Extraction Kit (Active Motif). 
Materials. RITA (CAS 213261-59-7; catalogue no. 10006426), Nutlin-3a (CAS 
675576-98-4; catalogue no. 18585) (Cayman Chemical) and Nil (CAS 641571-10-0) 
(Selleck Chemicals) were stored as per manufacturer’s instructions. CPI-203 
and CPI-0610 were obtained from Constellation Pharmaceuticals and kept as a 
solid powder at room temperature. Das (Selleck Chemicals) was kept as a stock 
solution (10 mg ml!) in dimethylsulfoxide (DMSO; Sigma-Aldrich) and prepared 
and stored in aliquots at —20°C. Imatinib mesylate (LC Laboratories) was stored at 
100 mM in distilled water at 4°C. RG7112 and RG7388 were supplied by Roche. A 
list of all antibodies used is provided in Supplementary Table 8. 

Proteomics. Methods used have been described'”. Twenty micrograms of protein 
was isobarically tagged (iTRAQ reagent, ABSciex). Peptides were identified by 
Reverse phase liquid chromatography tandem mass spectrometry (RP-LC-MS/ 
MS) on three different instruments: ABSciex Q-STAR Elite, Thermo LTQ Orbitrap 
Velos, ABSciex TripleTOF 5600. For the 5600 and Elite, dried peptide fractions 
were resuspended in 15411 3% (v/v) acetonitrile, 0.1% (v/v) formic acid and 
20 mM citric acid. For each analysis, a 5,11 peptide sample was loaded onto a nano 
ACQUITY UPLC Symmetry C18 Trap (51m, 180j1m x 20mm) and separation 
of the peptides was performed using nanoACQUITY UPLC BEH C18 Column 
(1.7m, 75pm x 250mm). For the Orbitrap, 10% of the peptide sample was loaded 
onto Acclaim PepMap j1-Precolumns, analytical separation of the peptides was 
performed using Acclaim PepMap RSLC C18 Columns. Data were acquired using 
the information-dependent acquisition (IDA) protocol. 

Elite and 5600 data were processed by a ‘thorough’ search against the 

UniProtKB/SwissProt human database containing 532,146 sequence entries 
using ProteinPilot Software 4.1, revision number 460, Paragon Algorithm 4.0.0.0. 
Orbitrap data were analysed using Proteome Discoverer 1.3. The data were 
searched using the MASCOT node of Proteome Discoverer with the UniProtKB 
database (release 2011_11). The proteins observed in the three data sets demon- 
strated that using multiple instruments enhanced coverage (Extended Data Fig. 1). 
Analysis and integration of MS proteomic data. MS data sets were filtered for 
peptides observed in all channels (one normal sample was removed from all 
experiments due to poor labelling). Deregulated proteins were identified using a 
threshold of mean +2 s.d. on CML versus normal log; ratios. This candidate list 
was refined to include only those proteins corroborated by (1) log ratio changes 
of +0.5 in murine Ba/F3 + BCR-ABL MS data’, (2) a complementary CD34* cell 
proteomics data set*®, and/or (3) all instruments within the current experimental 
data set. A parallel, manual inspection retained candidates if (1) log ratios were 
= £1.3, or (2) log ratios were lower and neither alternative instrument reported 
differential expression in the opposite direction; this manual selection step was 
blinded. Together, these filtering steps reduced the candidate list to 58 proteins 
(see Supplementary Table 1). 
Formation of candidate network. The MetaCore implementation (13 June 2012) 
of Dijkstra’s shortest path algorithm’, a general purpose algorithm that identifies the 
shortest paths between ‘seed’ nodes of interest in a graph, was used to build a network 
around the 58 deregulated proteins (the graph used was the fully manually annotated 
MetaCore KB). Paths between seeds were limited to length = 2 and all shortest paths 
of the minimum length were retained in the resulting network. Topology statistics 
(Supplementary Table 2) were calculated using the igraph package in R. 
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Transcriptomic data analysis. Three bulk CD34* (1-3) and four primitive LSC 
(4-7) CML chronic phase versus normal data sets are discussed: (1) E-MTAB-2581: 
Affymetrix Human Gene 1.0 ST Array transcriptional data from newly diagnosed 
CML chronic phase and normal progenitor cells (CD34*CD38") from G-CSF- 
mobilized peripheral blood. mRNA was extracted using the RNeasy Mini Kit 
(Qiagen) and DNase I treated on columns using the RNase (RNase)-Free DNase 
Set (Qiagen). Affymetrix GeneChIP analysis was performed using 50ng of RNA 
to the manufacturer’s instructions; (2) Gene Expression Omnibus (GEO) acces- 
sion GSE47927 (ref. 40); (3) GEO accession GSE5550 (ref. 41); (4) ArrayExpress 
accession E-MTAB-2581 (as described in (1) but for CD34*CD38°~ cells); 
(5) GEO accession GSE47927 (ref. 40); (6) GEO accession GSE24739 (ref. 42); and 
(7) ArrayExpress accession E-MTAB-2508 (ref. 43). 

CEL files were obtained directly from collaborators or public repositories. All 
transcriptional data were RMA normalized with the exception of GSE5550 for 
which only the VSN“ normalized data were available. logy scale expression values 
were analysed using limma’® (P value correction by Benjamini-Hochberg“). In 
calculating the correlations and MI statistics (Fig. 1b, cand Extended Data Fig. 1), 
multiple probesets corresponding to single genes were median averaged. 
Calculating correlations across multiple data sets. Ensembl’s web services 
were used to map between (1) human and murine data sets via orthologues and 
(2) human transcriptomic and proteomic data sets via HUGO Gene Nomenclature 
Committee (HGNC) symbols. The Bioconductor package biomaRt (v.2.18.0) was 
used in R (v.3.0.1). Pearson’s product moment correlation coefficients (r) were 
calculated across data sets for the 58 candidates and the 30 networked candidates 
(r, and ry respectively in Extended Data Fig. 1a, d). Resulting r’ values quantify 
the proportion of variability captured by the linear relationship and 1’, is defined 
as r’,,/r”., that is, the ratio of r? for the 30 networked candidates to the 7’ for the 
58 candidates. FDRs for the 1° observed were generated by considering 10,000 
random samplings of each data set and counting the number of random samples 
meeting or exceeding the 7’, and r’, statistics. 

Calculation of MI. We calculated the MI values for 10,000 random subsets (of size 30) 
to generate a distribution of MI values that we would expect by chance (expression 
values binned between —3 and 3, bin width= 0.1; entropy package in R); these are the 
values summarized by the distributions plots in Fig. 1c and Extended Data Fig. 1b, e. 
The FDR values represent the proportion of random subsets that generate an MI 
greater than or equal to the MI for the network. 

Calculation of proteomic/primitive transcriptional consistency FDR. Of the 
30 proteins in Extended Data Fig. 1c, 21 showed consistency of deregulation in at 
least three of the primitive transcriptional data sets and the bulk proteomic data 
set. To assess how likely it would be to observe such consistency by chance, the 
data were randomly permuted 10,000 times and permutations exhibiting similar 
deregulation consistency (that is, deregulation correspondence across four data 
sets) were recorded. 

Topological analysis of p53/c-MYC in other MS data sets. Comparison of 
leukaemogenic PTK proteomic effects and three primary proteomics data sets, 
describing two types of breast cancer"! (3 ductal carcinoma in situ and 4 invasive 
carcinoma breast cancer patient samples with matched normal samples), three 
types of prostate cancer!” (24 non-aggressive, 16 aggressive, 25 metastatic prostate 
cancer patient samples and 10 normal samples) and cervical? cancer, were obtained 
directly from the authors. In each data set, deregulated proteins were identified 
using z-scores + 2 when considering cancer versus normal log) ratios and subjected 
to the same MetaCore network building process as described earlier. In the case of 
the cervical cancer data set no network could be found and the data were removed 
from the analysis. In addition, 50 sets of 58 random proteins were generated from 
the list of all proteins observed across the three MS data sets and subjected to the 
same network building process. The ratio dout/din was calculated to quantify the 
bias of outgoing to incoming connections to/from p53 and c-MYC (Fig. 1d). 
TKI-R/TKI-NR and aggressive/indolent CML transcription. The PB ‘valida- 
tion’ set TKI-R/TKI-NR samples in GEO data set GSE14671 were integrated with 
a material-matched CML versus normal data set (CD34*CD38" cell data from 
ArrayExpress E-MTAB-2581) using COMBAT” (Bioconductor package inSili- 
coDb v.1.10.1). All probeset-to-probeset mappings between the Affymetrix HG 
U133+2 and Affymetrix HuGe 1.0 ST chips (obtained via Bioconductor’s biomaRt 
package) were retained and used by COMBAT. The TKI-R and TKI-NR samples 
were compared to the integrated normal data (using limma) to generate logFC 
values representing differential expression (see left two lanes in each panel of 
Fig. 5e). The pattern of differential expression in the TKI-R/TKI-NR versus normal 
comparisons was then compared to that of CML versus normal, as calculated 
separately by limma in the material-matched data sets (E-MTAB-2581) (see right 
lane in each panel of Fig. 5e) to provide a baseline CML versus normal comparison. 
Transcriptional profiles for probe pairs corresponding to the 30 members of the 
candidate network were identified using HGNC symbols (data corresponding to 
TARDBP were removed due to the large number (66) of corresponding probe pairs). 
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An FDR was calculated using 10,000 re-samplings of the merged data set to 
describe the likelihood of observing a correlation as high or greater by chance. The 
transcriptional data for the aggressive and indolent samples in ArrayExpress data 
set E-MIMR-17 were processed as described above, but integrated with the CML 
and normal CD34* data set GSE5550 using all probeset-to-probeset mappings 
between the Affymetrix U133a and Affymetrix HG Focus chips (again obtained 
using biomaRt). 

Enrichment of MSigDB signatures. For the candidate network analysis 
(the results of which are shown in Supplementary Table 3), MSigDB signatures 
(C2: curated genesets) were accessed via the c2BroadSets object in GSVAdata 
v.1.0.0 using R. Enrichment scores were calculated using the hypergeometric 
distribution as implemented by dhyper(). Signatures corresponding to CML, 
c-MYC and p53 related biology were extracted using appropriate regular expres- 
sions on the signature name. In the RNA-seq analysis (Fig. 5b), the MSigDB signa- 
tures (C2: curated genesets) were identified as significantly differentially expressed 
in the TMM/VOOM-normalized RNA-seq data using GSVA* and limma”’. 
Significant pathways were identified using an FDR=0.05 threshold on corrected 
Pvalues*®. The p53, apoptosis, c- MYC and differentiation signatures were extracted 
using appropriate regular expressions on the signature name. 

Enrichment of PANTHER pathways. The top 1,500 differentially expressed genes 
(as ranked by increasing P value, calculated by limma”’) were identified comparing 
treated/untreated samples. These genes and their logFCs for each arm were 
uploaded to PANTHER (http://www.pantherdb.org/) and subjected to a Mann- 
Whitney U-test*® to identify enrichment of PANTHER pathways. Extended Data 
Figure 6b shows enrichment results (without Bonferroni correction) including the 
hypothesized direction of pathway deregulation. 

Cell counting and apoptosis assays. CML CD34°* cells were seeded at 
1-2 x 10° cellsml“! before drug treatment and counted by trypan blue (Sigma- 
Aldrich) exclusion. Apoptosis was quantified by staining with annexin- V-APC 
and DAPI. In specified experiments, CML and human cord CD34* cells were 
labelled with CD34—APC and CD38-FITC or PerCP-Cy5.5 and sorted using 
the FACSAria (BD). The selected CD34*CD38° cells were analysed 72h after 
drug treatments. To measure the dose-effect relationship of each drug and its 
combination and to determine synergy, Cls were calculated using the Calcusyn 
software package (BioSoft). Except where documented, all results are expressed 
as a mean +s.e.m. 

CFC assay. CD34* cells were treated for 72 h at the indicated concentrations of 
RITA, CPI-203 and Das. Drug-treated cells (2,000 cells per plate) were washed 
and seeded in Methocult H4435 (STEMCELL Technologies). CML cells were 
transduced with lentivector constructs and sorted then washed and seeded into 
methylcellulose. Colonies were assessed 10-14 days after plating. 

Western blotting. CD34* cells were lysed in RIPA buffer with inhibitors and 
western blots were performed as per standard protocols. 

Immunofluorescence microscopy. CML CD34" cells were left untreated or 
treated with the indicated drugs for 24h. Cells were harvested and spotted onto 
slides coated with poly-L-lysine and fixed with 3.7% (w/v) formaldehyde and 
permeabilized using a 0.25% (w/v) Triton-PBS solution for 15 min. Cells were 
blocked with 5% (w/v) BSA-PBS and stained with primary and secondary antibodies. 
Cells were concurrently stained with DAPI. Cells were imaged using a Zeiss Imager 
M1 AX10 fluorescence microscope (Carl Zeiss) and subjected to deconvolution 
(Axio Vision software; Carl Zeiss) for image manipulation. Fluorescent signal was 
measured in three dimensions by Image Processing and Analysis in Java (Image J) 
program. 

Tracking cell divisions. CFSE (Molecular Probes) and CD34 staining were 
performed and cell divisions were identified as described previously’. 
Lentivirus transduction. The pCMV-VSV-G and pCMV-HIV1 were provided 
by J. Rossi. The pLKO-GFP came from K. Kranc. The following optimized pLKO 
vectors were purchased from Open Biosystems and subcloned into the pLKO- 
GFP vector: (1) TRCN0000003380: MDM2 shRNA bacterial stock NM_002392. 
x-1495slcl (ref. 51); (2) TRCN0000355728: MDM2 shRNA bacterial stock 
NM_002392.3-1496s21c1; (3) TRCN0000174055: c-MYC shRNA bacterial stock 
NM_002467.2-1377s1c2 (ref. 52); (4) TRCN0000039642: c-MYC shRNA bacterial 
stock NM_002467.2-1377slcl (ref. 53). 

Transduction of HeLa cell lines was performed at a MOI 1-10 with 70-95% of 
the cells expressing GFP after 48 h. For transduction, CD34" cells were cultured 
in medium supplemented with growth factors (IL-3 25ng ml}, IL-6 10ngml-!, 
Flt-3L 100ngml!, SCF 50ngml-!, TPO 100ng ml!) for 48h, followed by two 
exposures to concentrated virus-containing supernatants (multiplication of 
infection= 5) via spinoculation. Cells were harvested 48 h after second transduction 
and analysed or sorted for GFP positivity. 

Transduced viable cells (assessed as annexin-V~/DAPI- percentages multiplied 
by the absolute cell count) are presented as a percentage of CML CD34" cells 
transduced with scramble control. 


Immunodeficient mouse engraftment. For the ex vivo drug studies CML (2 x 10° 
cells per mouse) or cord blood (2 x 10°) CD34* cells were cultured with the 
indicated drugs (RITA 70 nM, CPI-203 1|1M and Das 150nM). After 48h cells were 
transplanted via tail vein into female 8-10-week-old sublethally irradiated (2.5 Gy) 
NOD.Cg-PrkdcscidI]2rgtm1Wjl/SzJ NSG mice (The Jackson Laboratory). Human 
cells were assessed by anti-human CD45 antibody analysed by flow cytometry. 
Specific cell subsets were detected using antibodies to human CD34, CD33, CD11b, 
CD14 and CD19 (mouse antibody table in Supplementary Table 8). 

For the in vivo drug treated NSG experiments, CML (2 x 10° cells per mouse) 
CD34* cells were transplanted via tail vein into female 8-10-week-old sublethally 
irradiated (2.5 Gy) NSG mice (The Jackson Laboratory). After 4 weeks, mice 
were treated with RG7388 (75-100 mg kg, oral gavage once daily), CP1-203 
(6-7.5 mg kg, intraperitoneally twice daily) or Nil (50 mgkg"', oral gavage once 
daily) for 3-4 weeks. Two CML samples were assessed separately, each performed 
with 4-5 mice per drug arm per experiment. Results presented represent data 
from both experiments (each experiment normalized to vehicle). To quantify the 
frequency of BCR-ABL* cells within the engrafted human CD45* cells, dual-fusion 
D-FISH was performed as previously described’. 

Transgenic mouse model. Inducible (tetracycline (TET)-based) DTG 
(SCLtTAxBCR-ABL1) donor mice in a C57BL/6 (CD45.2) background were a gift 
from D. G. Tenen. B6.SJL-Ptprc* Pepc’/Boy] (CD45.1) recipients (a mixture of female 
and males between 8-10 weeks old) were purchased from Charles River Laboratories. 
Bone marrow transplantation and analysis of disease. Bone marrow cells of 
DTG mice (1 x 10°) were injected into the tail veins of 10-week-old irradiated 
(2 doses of 4.25 Gy, 3h apart) recipients. TET was continued for 2 weeks after 
radiation. Tail veins bleeds were performed weekly after TET removal and 
Gr1/Macl percentages (flow cytometry), white blood cells, neutrophils and 
haemoglobin (Hemovet) were monitored. 

DTG in vivo drug treatment. Drugs were administered to DTG mice 5 weeks 
after transplantation, over a 4-week period. Nil 75 mgkg ! once daily, CPI-0610 
15mgkg! twice daily and RG7112 50mgkg"! once daily, all by oral gavage. For 
no drug control mice were administered the vehicles at the same concentrations 
and volumes as used for the combination arm. 

Flow analysis. Peripheral blood, bone marrow and spleen cells were stained using 
appropriate antibodies and analysed using a FACSCanto or FACSAria machine 
(BD Biosciences). 

Husbandry. All experiments were performed in accordance with the local ethical 
review panel, the UK Home Office Animals Scientific Procedures Act, 1986, and 
UK Co-ordinating Committee on Cancer Research (UKCCCR) and National 
Cancer Research Institute (NCRI) guidelines. 

Animals were kept in regulated facilities, monitored daily, and all experiments 
were carried out in compliance with UK Home Office guidelines. Mice were 
genotyped by Transnetyx. 

RNA-sequencing. 

Sample preparation. CML CD34? cells were seeded in 48-well plates at 
1-2 x 10° cellsml~!, before drug treatment (RITA 50nM, CPI-203 11M, Nil 5 uM) 
for 24h. After treatment, RNA was extracted using RNeasy Plus Mini Kit (Qiagen). 
Library generation. RNA-seq libraries were generated using TruSeq Stranded 
Total RNA (part no. 15031048 Rev. E October 2013) kits. Ribosomal depletion 
was performed on 1 jig of RNA using Ribo-Zero Gold before a heat fragmenta- 
tion step aimed at producing libraries with an insert size between 120-200 bp. 
Complementary DNA was synthesized from the enriched and fragmented RNA 
using SuperScript II Reverse Transcriptase (Invitrogen) and random primers. 
The cDNA was converted into double-stranded DNA in the presence of dUTP to 
prevent subsequent amplification of the second strand. After 3’ adenylation and 
adaptor ligation, libraries were subjected to 15 cycles of PCR to produce RNA-seq 
libraries. Before sequencing, RNA-seq libraries were qualified and quantified via 
Caliper’s LabChip GX (part no. 122000) instrument using the DNA High Sensitivity 
Reagent kit (product no. CLS760672). Quantification of libraries for clustering was 
performed using the KAPA Library Quantification Kits for Iumina sequencing 
platforms (kit code KK4824) in combination with Life Technologies QuantStudio 
7 real-time PCR instrument. Libraries were finally pooled in equimolar 
ratios and sequenced on Illumina’s NextSeq500 platform using 75 bp paired-end 
high-output runs. 

Alignment and analysis. Sequencing reads were aligned to the genome (GRCh38/ 
release 80 primary assembly as obtained via ftp.ensembl.org) using Subread 
(v.1.4.6-p3)*4. RNA-SeQC was used to confirm adequate mapping quality and 
gene-level counts were calculated using Subread’s featureCounts algorithm”. 
Count data for each arm were normalized independently by TMM™ (as imple- 
mented in the Bioconductor package edgeR) and VOOM” (as implemented in the 
Bioconductor package limma). Genes with <3 cpm in three samples were removed 
from further analysis. Differential expression was identified using limma” (using 
Benjamini-Hochberg“ correction). 
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Definition of synergy. Genes were described as loosely synergistic if 
(1) RITA, CPI-203 and combined treatment all induced deregulation in the same 
direction, (2) the deregulation in response to the combined treatment was 
significant (q < 0.05), and (3) the deregulation induced by the combined treatment 
was greater than both RITA and CPI-203 in isolation. A more extreme definition 
had an additional criteria of the log ratio of the observed and additive effect being 
>0.6 (corresponding to a 150% increase). 

Statistical analysis. No statistical methods were used to predetermine sample 
size. For in vitro experiments a minimum of 3 patient samples were chosen as 
a sample size to ensure adequate power. For all animal studies, each experiment 
was replicated twice in the laboratory with a minimum number of 5 mice per arm, 
unless indicated. NSG mice were excluded from analyses if they died of radiation 
poisoning (within 10 days of being irradiated, out of a 16-week procedure). For 
DTG mice, mice were excluded from analysis if leukaemic cells (CD45.2) failed 
to engraft host mice (CD45.1*) and therefore would not develop leukaemia. This 
was determined 1 week before drug treatment. Patient samples were only excluded 
if clinical data identified patient sample as entering blast crisis. Pre-established 
criteria also included that if a sample data point deviated 2 standard deviations 
from the mean, it was to be excluded, but this was not applied to the data in 
main or extended data. Group allocation to mice was done as mice were either 
purchased and subsequently numbered or weaned, to remove any investigator bias. 
For both NSG and DTG mice studies, all mice were randomly assigned treatment 
groups, ensuring all animals were of equal health and leukaemic status, within 
normal variability. Mice were assessed at predetermined time points: NSG mice 
were assessed at 8 and 16 weeks and DTG mice were assessed 4 weeks after drug 
treatment, so there was minimized bias as to assessing Outcome. All mice were 
cared for equally in an unbiased fashion by animal technicians and investigator. 
No blinding was done. 

Unless indicated, data are presented as the mean + s.e.m. and P values were 
calculated by two-tailed Student's t-test using GraphPad Prism software. Significant 
statistical differences (*P< 0.05, **P< 0.01, ***P< 0.001) are indicated. 

Code availability. All computer code was implemented in R and is available from 
the authors upon request. 
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Extended Data Figure 1 | See next page for caption. 
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Extended Data Figure 1 | BCR-ABL drives a proteomic signature 
mediated by p53 and c-MYC. a, b, Equivalent to Fig. 1b, c with additional 
information regarding the correlations calculated from the complete list 
of 58 candidate proteins (r,) in addition to the correlations for the 
candidate network (r,) and the background (79). Also shown is the gain 
in r’ obtained for the candidate network as compared to the r’ obtained 
for the candidate list as a whole (77). FDR calculated from 10,000 
re-samplings. c, Expression changes of the network components (shown 
as bar plots) in the context of quiescent and primitive CML cells; data 
shown in each panel (left to right) are (1) CD34* protein log, ratios 
(n=3 patient samples, n = 2 normal samples); (2) CD34*Hst!’Py!? 
transcript logFC (ArrayExpress accession E-MTAB-2508); 

(3) CD34*Hst!’Py'° transcript logFC (GEO accession GSE24739); 

(4) CD34*CD38~ logFC (ArrayExpress accession E-MTAB-2581); 

and (5) Lin™CD34*CD38~ CD90* logFC (GEO accession GSE47927). 


ARTICLE 


Down-/upregulation is indicated by turquoise/red, respectively. 

Where multiple probesets were found for individual genes, the 

probeset corresponding to the maximal log ratio was selected. 

d, e, Correlation of the candidate network in progenitor (CD34*) CML 
cells: CD34*CD38* progenitor (top); common myeloid progenitor 

Lin” CD34*CD38*CD123'CD45RA_ (middle); and CD34" cells 
(bottom). As in a, b, correlations for the background (ro), candidate list 
(58 proteins, r,) and candidate network (Fig. 1a, r,) are shown. Also 
shown is the gain in r° obtained for the candidate network as compared 
to the r* obtained for the candidate list as a whole (774). FDR calculated 
from 10,000 re-samplings; MI statistics corresponding to FDRs < 0.05 are 
coloured red, FDRs < 0.10 are coloured grey. f, A Venn diagram showing 
the overlap in protein identification of the three MS instruments: ABSciex 
Q-STAR Elite (Elite), Thermo LTQ Orbitrap Velos (Orbi) and ABSciex 
TripleTOF 5600 (5600). 
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Extended Data Figure 2 | Validation of network candidates. a, HDM2 (1 construct). b, Transduced viable GFP* cells (assessed as annexin-V_/ 
and c-MYC knockdown using shRNA constructs. Western blots of DAPI~/GFP* percentages multiplied by the absolute cell count) are 
c-MYC, HDM2, p53 and HSP90 in HeLa cells transduced with lentiviral presented as a percentage of CML CD34" cells transduced with scramble 
constructs specific for either c-MYC (2 constructs), HDM2 (2 constructs) control (n= 3 patient samples). c, Early apoptosis levels (assessed as 
or scrambled control (1 construct). KD, knockdown. b, c, CML CD34* annexin-V*/DAPI /GFP*) after transduction of CML CD34* cells (n=3 
cells were transduced with either lentiviral (GFP) shRNA constructs to patient samples) as described in b. Statistical significance was calculated 
HDM2 (constructs 1, 2), c-MYC (constructs 1, 2) or scramble control by a two-tailed Student's t-test and error bars represent the s.e.m. 
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CPI-203 and combination drug treatment eliminates CD34* CML cells (annexin V/DAPI) using flow cytometry techniques. 
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Extended Data Figure 5 | RITA and CPI-203 selectively eliminate LSCs. sublethally irradiated (2.5 Gy) NSG mice (2-4 mice per arm). e, Percentage 
a, b, Viable cell counts (n = 3 patient samples) (a); apoptosis in normal of human CD45* cell levels in peripheral blood (PB) at 8, 12 and 16 weeks. 
CD34 cells (n=3 patient samples) in response to RITA and/or CPI-203 (b). _f, Percentages of human CD45*, CD34t, CD33*, CD11b*, CD19* and 
c, Gated CML CD34*CD38° cells 72h after treatment (n = 4 patient CD14* cells in the bone marrow at 16 weeks. g, CML bone marrow analyses 
samples). d, Ex vivo protocol for CML/cord blood CD34* cells in NSG mice of CD33, CD11b, CD19 and CD14 from a CML sample determined to 
(n=5 mice per arm). e, f, Targeting p53 and c-MYC in CML eliminates engraft both BCR-ABL-positive and -negative cells. h, D-FISH analyses of 
NSG repopulating leukaemic stem cells. CML CD34* cells were treated bone marrow human engraftment studies shown in g performed twice 
with RITA (70nM) and/or CPI-203 (11M) or Das (150nM) for 48h and (2 patient samples) with a minimum of n=6 mice per arm; mean+s.e.m. 
recovered cells were injected intravenously into 8-12-week-old, (P values: two-tailed Student’s t-test; *P< 0.05, **P< 0.01, ***P< 0.001). 
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Extended Data Figure 6 | Mechanism of LSC elimination and clinical by increasing P value (top); only those genes exhibiting an absolute FC 
scope. a, Enrichment of p53 (top); apoptosis (second from top); c-MYC of >0.5 in each arm (bottom left); only those genes exhibiting a P < 0.05 
(second from bottom); and differentiation MSigDB signatures (bottom) each arm (bottom right). c, Assessing molecular synergy of the combined 
in the four treatment arms (n = 3 CML patient samples per arm) RITA plus CPI-203 treatment, as compared to the individual RITA and 
(columns named as per b). Equivalent to Fig. 5b, but with named MSigDB CPI-203 arms of the RNA-seq experiments in the three 

signatures. b, Enrichment of PANTHER pathways in the four treatment in silico functional signatures: p53/apoptosis (left); c- MYC (middle); and 
arms. Pathway enrichment calculated from the top 1,500 genes, as ranked differentiation (right). Mean expression is shown as a solid line. 
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Extended Data Figure 7 | Mechanism of LSC elimination and upregulation at the bottom. Corresponding expression data are provided 
clinical scope continued. a, Gene expression patterns (logFC, n = 3 in Supplementary Tables 5-7. b, Differential expression of CD34 and 
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Extended Data Figure 8 | RG7112 and CPI-0610 as a combination decrease BCR-ABLY cells. a, b, DTG mice in vivo treatment (a): neutrophils 
normalized to control (dotted line) (b). c, Bone marrow cells stained for CD45.1/2. Drug treatment arms (minimum of n =7 mice) mean +s.e.m. 
(P values: two-tailed Student’s t-test; *P < 0.05, **P< 0.01, ***P<0.001). 
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TRPVI1 structures in nanodiscs reveal 
mechanisms of ligand and lipid action 


Yuan Gao!?, Erhu Cao!+, David Julius! & Yifan Cheng”? 


When integral membrane proteins are visualized in detergents or other artificial systems, an important layer of 
information is lost regarding lipid interactions and their effects on protein structure. This is especially relevant to proteins 
for which lipids have both structural and regulatory roles. Here we demonstrate the power of combining electron cryo- 
microscopy with lipid nanodisc technology to ascertain the structure of the rat TRPV1 ion channel in a native bilayer 
environment. Using this approach, we determined the locations of annular and regulatory lipids and showed that 
specific phospholipid interactions enhance binding of a spider toxin to TRPV1 through formation of a tripartite complex. 
Furthermore, phosphatidylinositol lipids occupy the binding site for capsaicin and other vanilloid ligands, suggesting 
a mechanism whereby chemical or thermal stimuli elicit channel activation by promoting the release of bioactive lipids 


from a critical allosteric regulatory site. 


Transporters and ion channels reside in biological membranes, 
where lipids have important structural and regulatory roles’. 
However, structural characterization of protein-lipid interactions 
is challenging in detergent-based systems, making implementa- 
tion of more native, lipid-based environments an important goal. 
For crystallographic approaches, this has been achieved through 
the use of lipidic-cubic phase systems*° or formation of two- 
dimensional crystals in lipid bilayers®. For single-particle electron 
microscopy, one approach is to reconstitute proteins into spheri- 
cal liposomes for random spherically constrained single-particle 
reconstruction’. Another is to use lipid nanodiscs, hockey-puck-like 
structures in which a lipid bilayer patch is encircled by an amphi- 
pathic scaffolding protein’. Both approaches mimic the native 
lipid environment and can enhance functionality and thermal 
stability?!°, Moreover, nanodisc-embedded proteins are often 
monodisperse and especially suitable for single-particle elec- 
tron cryo-microscopy (cryo-EM)!"!”. Nevertheless, membrane 
protein structures determined with these systems have achieved 
limited resolution to date, failing to reveal detailed protein-lipid 
interactions. 

Cryo-EM can now be used to obtain structures of many biolog- 
ical macromolecules at near-atomic resolution'?"°. An important 
next goal is to enable cryo-EM to define interactions between small 
molecules and their protein targets at the atomic level. The heat- and 
capsaicin-activated ion channel, TRPV1, is an excellent model with 
which to address these challenges. This sensory receptor is modulated 
by membrane lipids and their metabolites, and activated or inhib- 
ited by various ligands, including vanilloid compounds and peptide 
toxins!®!7, Moreover, TRPV1 structures in multiple conformational 
states have recently been determined by cryo-EM under conditions 
in which purified channel protein was stabilized with an amphipathic 
polymer'®!’, These structures provide a standard against which other 
preparations can be assessed. Here we show that high-resolution struc- 
tures can be obtained when TRPV1 is embedded in a nanodisc, and 
use this system to characterize channel-lipid interactions, revealing 
novel structural mechanisms underlying ligand binding and channel 
gating. 


Structure of TRPV1 in lipid nanodiscs 

We reconstituted purified TRPV1 protein into lipid nanodiscs gen- 
erated with different membrane scaffold proteins (MSPs) (Extended 
Data Fig. 1). For structural analysis, we favoured preparations using 
MSP2Nz2, which forms nanodiscs of ~150 A diameter and is suffi- 
cient to accommodate TRPV1 without imposing spatial constraint 
(Extended Data Fig. 1d). Indeed, cryo-EM images of frozen hydrated 
samples revealed monodispersed TRPV 1-nanodisc particles. Two- 
dimensional class averages showed TRPV1 tetramers with distinct 
channel features floating within the nanodisc (top view) (Fig. 1a). 
Transmembrane helices and cytoplasmic domains were clearly visible 
within a disc-like density contributed by the lipid bilayer (side views). 
Importantly, the presence of the bilayer and MSP did not preclude accu- 
rate image alignment. 

We determined three structures of TRPV1 in nanodiscs, including 
unliganded, agonist-bound, and antagonist-bound states at resolutions 
of 3.2, 2.9 and 3.4, respectively (Fig. 1b and Extended Data Figs 2-4). 
These structures can be compared directly to those previously obtained 
in amphipol!*!°. Generally speaking, density maps determined with 
nanodiscs were of superior quality. This is especially evident when 


Figure 1 | TRPV1 structures determined in lipid nanodisc. a, Side and 
top views of reference-free two-dimensional class averages of TRPV1 in 
nanodiscs, showing transmembrane helices and lipid bilayer. The size 

of the class average windows is 233 A.b, Side and top views of three- 
dimensional reconstruction of TRPV1-ligand-nanodisc complex. 
Individual channel subunits are colour-coded with two molecules of DkTx 
(purple) atop the channel and a molecule of RTX (red) in the vanilloid- 
binding pocket. Densities of the nanodisc (grey) and well-resolved lipids 
(blue) are also shown. 
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examining side-chain densities within transmembrane regions or con- 
necting loops that face lipids, such as S1 and S2 helices and the S2-S3 
linker (Extended Data Fig. 5a-f). Interestingly, improvements were not 
limited to transmembrane regions, but also extended to cytoplasmic 
domains, enabling us to build a model including previously unresolved 
regions (Extended Data Fig. 6a, b). These improved density features 
may reflect enhanced stability of the channel in the nanodisc, but other 
technical advances also contribute (Extended Data Table 1a). The 
nanodisc- and amphipol-stabilized structures of a given conformational 
state are essentially identical, albeit with some specific differences that 
may relate to lipid and/or ligand binding (see later). 

Two layers of continuous density corresponding to lipid head groups 
mark the bilayer boundaries and surround the channel (Fig. 1a, b and 
Extended Data Fig. 7a). Furthermore, well-resolved lipid-like densities 
associate with various regions of the channel, indicative of well-ordered 
lipids that form specific protein interactions (Fig. 1b and Extended 
Data Fig. 7b, c). These include annular lipids that fill crevices between 
subunits and reside within the outer leaflet surrounding pore-forming 
domains of the channel, reminiscent of voltage-gated potassium 
channels”®. We also observed lipids in hydrophobic clefts, as exemplified 
by a density within the lower segment of the S1-S4 domain, whose shape 
and local environment suggest that it represents a molecule of phosphati- 
dylcholine (Extended Data Fig. 7c). Indeed, we observed a similar density 
in this location for TRPV1 in amphipol””, suggesting that an endogenous, 
tightly bound lipid helps stabilize a hydrophobic crevice within the $1-S4 
domain, which remains stationary during channel gating"®. 


Lipid-channel-toxin tripartite complex 

TRPV1 can be stably trapped in its fully open state when exposed to 
resiniferatoxin (RTX)—an ultra-potent vanilloid agonist'*?!—plus 
double-knot toxin (DkKTx), a bivalent tarantula peptide that consists 
of two nearly identical inhibitor cysteine knot (ICK) motifs joined by a 
short (7-amino-acid) linker”. Two DkTx molecules bind to one TRPV1 
tetramer such that each knot assumes a specific orientation with respect 
to the channel, suggesting that two DkTx molecules adopt an antipar- 
allel configuration'®. In our nanodisc structure, we initially applied 
C4 symmetry to achieve optimal resolution, yielding a 2.9 A map of 
the nanodisc-stabilized, RTX/DkTx-activated channel, compared to 
3.8 A for the amphipol-stabilized complex (Fig. 2a and Extended Data 
Fig. 8). To gain further information about non-equivalent regions of 
the toxin, we applied C2 symmetry independently, which was insuf- 
ficient to reveal specific features associated with each knot and their 
relationship to one another, indicating that some particle images were 
misaligned by 90° around the symmetry axis. Focused classification 
on the toxin and adjacent regions enabled us to partially separate the 
two possible orientations to obtain an improved C2 averaged map, as 
evidenced by more pronounced features within the antiparallel linker 
connecting the ICK knots (Fig. 2b and Extended Data Fig. 8). 

With improved maps, we rebuilt and refined the atomic model of the 
fully open channel with associated ligands. For DkTx, three canonical 
disulfide bonds are clearly resolved, as are most side chains in regions 
that interact with TRPV1 (Fig. 2a and Extended Data Fig. 8b). Here we 
find that residues involved in the channel interaction are highly con- 
served between ICK knots, consistent with the fact that the side-chain 
densities of these residues were well resolved even when C4 symmetry 
was applied. Interestingly, the density of the linker domain is also well 
resolved (Fig. 2c), revealing a taut and constrained conformation that 
probably contributes to the high-avidity interaction with the channel”. 

The nanodisc system enabled us to determine where interactions 
occur with respect to the lipid bilayer. Two hydrophobic fingers from 
each ICK knot insert into the bilayer (Fig. 2a) and several phospholipid 
densities at these sites are well resolved, probably reflecting their stabili- 
zation through specific toxin interactions (Fig. 2c, dand Extended Data 
Fig. 7b). For example, a tryptophan side chain in finger 1 (Trp11 of knot 
1 and Trp53 of knot 2) interacts with the aliphatic tail of a phospholipid 
whose head group forms a polar interaction with Arg534 in TRPV1, 
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a, Sequence of DkTx (top) showing location of intramolecular disulfide 
bonds and finger-like loops formed primarily by residues conserved 
between toxin knots (orange). Hydrophobic residues enable fingers to 
penetrate the lipid bilayer by ~9 A (bottom). b, Schematic top-down view 
showing antiparallel arrangement of two DkTx molecules (purple) binding 
at subunit interfaces of a TRPV1 homo-tetramer (subunits are colour- 
coded). c, Cutaway view depicting one DkTx molecule interacting with 
two adjacent TRPV1 subunits (grey) and associated lipids (blue spheres; 
red and orange spheres depict phosphate head groups). Superimposed 
ribbon diagram (light blue) denotes location of transmembrane a-helices 
for one channel subunit. d, Detailed view of boxed region in c showing 
interactions between lipids and amino acid side chains from channel and 
toxin (dotted line, hydrogen bond). Helices from three neighbouring 
channel subunits are colour-coded as in b. 


located in the extracellular loop connecting the $3 and S4 helices. This 
sort of tripartite complex between toxin, lipid, and channel is also seen 
proximal to finger 2, where a phenylalanine side chain (Phe27 of knot 1 
and Phe67 of knot 2) is stabilized through hydrophobic interaction with 
an aliphatic lipid tail. Furthermore, the lipid head group is coordinated 
by the side chain of Ser629 at the top of the channel’s pore helix domain, 
as well as by interaction with Tyr453 from S1 of the adjacent channel 
subunit (Fig. 2d). Thus, together with the newly refined apo model, 
we see that gating-associated side-chain movements within outer pore 
loops and pore helices are more clearly visualized compared with our 
previous structures in amphipols (Extended Data Fig. 5b). These new 
observations demonstrate how potential side-chain clashes between 
DkTx and the apo channel are relieved through lateral shifts in the 
outer pore loops and pore helices, primarily through reorientation 
of aromatic side chains (Fig. 3a). Moreover, they suggest a structural 
mechanism for how toxin binding stabilizes the open state. 

The nanodisc preparation also reveals local distortions in the lipid 
environment associated with toxin binding. For example, insertion of 
DkTx into the bilayer results in lateral and upward displacement of 
a phospholipid adjacent to finger 1, as well as lateral and downward 
displacement of another phospholipid proximal to finger 2 (Fig. 3b). 
The resulting energetic penalty may be compensated by toxin—channel 
interactions, as well as by new interactions formed between the channel 
and displaced lipids (Figs 2d and 3b). Such a tripartite arrangement 
probably determines the overall affinity and kinetics of toxin binding. 


A resident lipid in the vanilloid pocket 

A particularly striking density within the vanilloid-binding pocket of 
the apo channel can be confidently interpreted as a phosphatidylinositol 
lipid whose branched acyl chains extend upwards between S4 of one 
subunit and S5 and S6 of an adjacent subunit, within a hydrophobic 
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Unliganded RTX/DkTx 


Figure 3 | Movement of protein and lipids associated with toxin 
binding. a, Movement of pore loop, pore helix, and part of S6 domain 
from closed (blue) to open (orange) states upon DkTx (purple) binding. 
Without such movement, one finger of DkTx would clash (yellow region) 
with the unliganded channel at the top of $6. Top-down view (right) 
shows two DkTx molecules atop TRPV1 (grey density). Toxin binding is 
associated with lateral shifts of the pore helix and loop (arrows), as well as 
large rearrangements of aromatic side chains within these regions. 

b, Two annular lipids (shown in blue, with phosphate in orange and 
oxygen in red) at the channel-toxin interface undergo both lateral and 
vertical movements upon DkTx binding. Dashed lines mark original 
position of phosphate groups in the absence of toxin (left); arrows indicate 
displacement of lipids in the presence of toxin (right). 


cleft facing the lipid bilayer. The inositol ring is bounded on each side by 
S3 and the elbow of the S4-S5 linker, with the TRP domain below (Fig. 4a). 
Polar interactions, such as that between Arg557 at the bottom of S4 and 
the hydroxyl group of the phosphate on position 1, or between Glu570 
in the S4-S5 linker and a hydroxyl group on position 6 of the inositol 
ring, further enhance stability (Fig. 4a and Extended Data Fig. 9a). 
Detailed analysis of the local protein environment suggests that addi- 
tional phosphate groups at positions 3, 4 and/or 5 of the inositol ring 
could form electrostatic interactions with Arg409 in a cytoplasmic 
N-terminal segment preceding S1 or Lys571 and Arg575 within the 
S4-S5 linker (Fig. 4a and Extended Data Fig. 9a, b). If so, then this 
pocket could favour a range of phosphatidylinositide species. 

A similar, albeit less well-resolved density was observed at this locale 
in our amphipol-stabilized structure’, suggesting that a tightly asso- 
ciated lipid is retained during channel purification. This, or other 
associated lipids, may derive from the soybean lipid extract that was 
added to improve protein stability, but it is also possible that they are 
of cellular origin. 


Mechanism of vanilloid action 

We next examined the structure of the vanilloid pocket when occupied 
by various ligands (Extended Data Fig. 9c, d). With nanodiscs, we could 
discern ligand structures in much greater detail compared to amphi- 
pol-stabilized structures. For example, RTX could be precisely fit by its 
atomic structure (Fig. 4b), and in a manner consistent with mutagenesis 
and modelling studies’? *. For the capsaicin-like homovanillyl ester 
moiety, key interactions include a hydrogen bond between Thr550 
and the carbonyl oxygen proximal to the vanilloid moiety, as well as 
between Ser512 and Arg557 and the vanilloid moiety at the hydroxyl 
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F570 ~ ei 
Figure 4 | Shared binding pocket for phosphatidylinositol lipids and 
vanilloid ligands. a, Surface representation of TRPV1 (grey) in cutaway 
view revealing location of bound co-factor (blue). Superimposed ribbon 
diagram (yellow) denotes location of transmembrane a-helices for one 
channel subunit. Detailed view of boxed region shows how co-factor 
density (blue mesh) accommodates a molecule of phosphatidylinositol. 
Positive and negative side chains from S4 and the S4-S5 linker, 
respectively, can form ionic interactions with negatively charged 
phosphate or hydroxyl moieties on the inositol ring. Helices from a 
neighbouring subunit (light blue) are also shown. b, Density for RTX 
(red mesh) is well fit by its atomic structure. Residues essential for RTX 
sensitivity (Y511, M547, T550) lie in close proximity to the ligand and 
can engage in electrostatic or hydrophobic interactions. Densities for 
phosphatidylinositol and RTX define overlapping, but non-identical sites 
(see also Extended Data Fig. 9). 


group. Tyr511, which assumes distinct rotomers in apo versus liganded 
TRPV1 structures!®, engages in hydrogen bonding with the ester oxy- 
gen of RTX. The five-membered diterpene ring component of RTX 
is stabilized by hydrophobic interactions with several amino acids, 
including Leu515, Val518, Met547 and Ile573, as well as Leu669 from 
aneighbouring subunit. These residues form a hydrophobic pocket that 
accommodates the heterocyclic region of the drug, probably accounting 
for high-affinity binding of this potent agonist. 

Comparison of apo versus RTX-bound states suggests that vanilloid 
agonists function by displacing the resident phosphatidylinositol lipid. 
Indeed, RTX docks within the same pocket otherwise occupied by one 
acyl chain of the lipid. Absence of the other acyl chain allows for reori- 
entation of Tyr511 to further stabilize RTX binding (Extended Data 
Fig. 9c, d). At the same time, RTX binding coordinates interaction 
between Arg557 and Glu570 to re-occupy the space vacated by the inositol 
head group, consequently pulling the S4—S5 linker away from the 
central axis to facilitate opening of the lower gate (Fig. 5 and Extended 
Data Fig. 9e). This mechanism is further supported by analysis of a 
capsazepine-bound structure (determined in either amphipol or 
nanodisc), in which this competitive vanilloid antagonist” occupies the 
same hydrophobic pocket as RTX, but apparently without facilitating 
the key interaction between Arg557 and Glu570 (Fig. 5 and Extended 
Data Fig. 9f). Indeed, mutations at these sites abrogate capsaicin-evoked 
responses, whereas charge-swapping mutations (R557E and E570R) 
partially restore channel function*°, consistent with our model. 
Parenthetically, we did not observe appreciable movement within the 
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Figure 5 | Structural rearrangements associated with vanilloid binding. 
a, Ribbon diagrams depicting relative locations of $4, S4—S5 linker, S6 and 
TRP domain helices in the presence of phosphatidylinositol (blue, left), 
RTX (orange, middle), or capsazepine (gold, right). The vanillyl ring of 


S1-S4 region (Extended Data Fig. 9e), indicating that the static nature 
of this voltage-sensor-like domain, as previously described'®, is not 
merely an artefact of amphipol packing. 


Concluding remarks 

Membrane proteins have been reconstituted into lipid nanodiscs and 
studied by single-particle cryo-EM'"””, but our results now show that 
this system can be taken to atomic resolution, enabling detailed struc- 
tural analysis of lipid-protein interactions in a more native or stable 
environment. A main concern about using nanodiscs for cryo-EM was 
that the bilayer mass would weaken the power of image alignment and 
limit the achievable resolution of imbedded proteins. Our results now 
show that this is not a problem. Indeed, as in the case of amphipol- 
stabilized TRPV1, the transmembrane core reached the highest 
resolution, indicating that image alignment was not adversely affected 
by the nanodisc. In addition to enabling visualization of specific, tightly 
bound lipids, the nanodisc provides a defined contour for the bilayer in 
relation to protein structure while revealing local deformations such as 
those associated with toxin binding. 

Biophysical and biochemical studies suggest that amphipathic ICK 
toxins, such as hanatoxin and SGTx1, first partition into the lipid 
bilayer, then engage their channel target through moderate-affinity 
protein-protein interactions*!. Furthermore, binding affinity may 
be enhanced by formation of a toxin-lipid-channel trimolecular 
complex”, Our DkTx-bound TRPVI structure supports this concept 
by showing that hydrophobic fingers of the toxin insert almost half- 
way (~9 A) through the outer leaflet of the bilayer, interaction surfaces 
between DkTx and TRPV1 are not extensive, and membrane lipids form 
bridging interactions between toxin and channel (Fig. 6a). Indeed, we 
achieved considerably better resolution for the RTX/DkTx-bound chan- 
nel, probably reflecting enhanced stability of such a tripartite complex. 
Overall, our findings are consistent with recent modelling studies based 
onan NMR structure of DkTx™. Finally, DkTx is uniquely bivalent, 
and our structure suggests that the taut linker region connecting the 
two ICK knots has evolved to perfectly match the distance between 
subunit-binding sites, which, together with the specific antiparallel 
orientation of toxin binding, probably contributes to the remarkable 
avidity and specificity of the DkTx-TRPV1 interaction. 

Many TRP channels function as ‘receptor-operated’ channels that are 
modulated by phospholipase-C-mediated phosphatidylinositol-4,5- 
bisphosphate (PtdIns(4,5)P») hydrolysis* 536 However, structural mech- 
anisms governing phosphatidylinositide-mediated regulation remain 
poorly understood. For TRPV1, it is not clear whether PtdIns(4,5)P. or 
other phosphatidylinositides bind directly to the channel, or function as 
obligatory co-factors, allosteric inhibitors, or both?*”. Moreover, chan- 
nel domains that specify phosphatidylinositide sensitivity have not been 
unambiguously identified. We now show that phosphatidylinositides 
function as endogenous, tightly bound co-factors that stabilize TRPV1 
in its resting state by serving as competitive vanilloid antagonists and 
negative allosteric modulators. At the same time, phosphatidylinos- 
itides may function as positive, obligatory co-factors whose binding to 
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RTX uniquely stabilizes the interaction between Arg557 and Glu570 to 
facilitate movement of the S4-S5 linker away from the central axis of the 
channel (indicated by red arrows), thereby facilitating opening of the lower 
gate through coupled movements (indicated by black arrows). 


TRPV1 in the closed state primes the channel for subsequent activation 
by vanilloids or other stimuli (Fig. 6b). Thus, our structures suggest a 
dual role for phosphatidylinositides through interactions at this single 
site. Moreover, structure-function studies suggest that regions within 
the TRPV1 C terminus interact with PtdIns(4,5)P, (refs 38-41) and 
thus additional mechanisms may contribute to phosphatidylinositide 
regulation of TRPV1 or other TRP subtypes. Our findings, together 
with those describing PtdIns(4,5)P. interactions with inwardly recti- 
fying potassium channels”, demonstrate that phosphatidylinositides 
can interact with membrane proteins in diverse ways. It is tempting to 
speculate that temperature-dependent displacement of endogenous 
phosphatidylinositides contributes to heat-evoked activation of TRPV1 
(Fig. 6c). 


S4-SS5 linker 
LS ¢g 


= 


Figure 6 | Mechanistic models for TRPV1 activation. a, Proposed 
mechanism for DkTx action. Two hydrophobic fingers (purple and pink) 
of each ICK knot (joined by three intramolecular disulfide bonds, yellow 
lines) enable the toxin to partition into the lipid bilayer (grey shade) and 
subsequently target TRPV1. In the closed state, the upper pore region 

of the channel (orange, pore helix; thick line, pore loop) undergoes 

brief spontaneous excursions to an open state, enabling DkTx to dock. 
Several annular lipids (blue ellipse with zigzag tails) bind at the channel- 
toxin interface to further stabilize the open state through formation of 

a tripartite complex. Resident phosphatidylinositides (blue hexagon 
attached to red sphere with zigzag tails) in the vanilloid pocket may leave 
upon toxin binding to facilitate allosteric opening of the lower gate. 

b, Proposed mechanism for vanilloid agonist action. Phosphatidylinositide 
co-factor binds in vanilloid pocket to stabilize the channel in its closed 
state. Vanilloid agonist (red hexagon attached to grey ellipse) displaces 
phosphatidylinositide to facilitate formation of a salt bridge between 
Arg557 (dark blue branch) and Glu570 (red branch), consequently pulling 
the S4-S5 linker away from the channel’s central axis to open the lower 
gate. c, Heat may open the channel through a similar mechanism involving 
thermal displacement of resident phosphatidylinositides. 


© 2016 Macmillan Publishers Limited. All rights reserved 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 


Received 9 December 2015; accepted 31 March 2016. 
Published online 18 May 2016. 


26. 
27. 


. Bai, X. C., Mc 


Hilgemann, D. W. Getting ready for the decade of the lipids. Annu. Rev. Physiol. 
65, 697-700 (2003). 

Hille, B., Dickson, E. J., Kruse, M., Vivas, O. & Suh, B. C. Phosphoinositides 
regulate ion channels. Biochim. Biophys. Acta 1851, 844-856 (2015). 

Lee, A. G. Biological membranes: the importance of molecular detail. Trends 
Biochem. Sci. 36, 493-500 (2011). 

Caffrey, M. A lipid’s eye view of membrane protein crystallization in 
mesophases. Curr. Opin. Struct. Biol. 10, 486-497 (2000). 

Landau, E. M. & Rosenbusch, J. P. Lipidic cubic phases: a novel concept for 
the crystallization of membrane proteins. Proc. Natl Acad. Sci. USA 93, 
14532-14535 (1996). 

Gonen, T. et al. Lipid-protein interactions in double-layered two-dimensional 
AQPO crystals. Nature 438, 633-638 (2005). 

Wang, L. & Sigworth, F. J. Structure of the BK potassium channel in a lipid 
membrane from electron cryomicroscopy. Nature 461, 292-295 (2009). 
Bayburt, T. H., Grinkova, Y. V. & Sligar, S. G. Self-assembly of discoidal 
phospholipid bilayer nanoparticles with membrane scaffold proteins. Nano 
Lett. 2, 853-856 (2002). 

Banerjee, S., Huber, T. & Sakmar, T. P. Rapid incorporation of functional 
rhodopsin into nanoscale apolipoprotein bound bilayer (NABB) particles. 

J. Mol. Biol. 377, 1067-1081 (2008). 


. Ritchie, T. K. et al. Reconstitution of membrane proteins in phospholipid bilayer 


nanodiscs. Methods Enzymol. 464, 211-231 (2009). 


. Efremov, R. G., Leitner, A., Aebersold, R. & Raunser, S. Architecture and 


conformational switch mechanism of the ryanodine receptor. Nature 517, 
39-43 (2015). 


. Frauenfeld, J. et a/. Cryo-EM structure of the ribosome-SecYE complex in the 


membrane environment. Nature Struct. Mol. Biol. 18, 614-621 (2011). 
ullan, G. & Scheres, S. H. How cryo-EM is revolutionizing 
structural biology. Trends Biochem. Sci. 40, 49-57 (2015). 


. Cheng, Y. Single-particle cryo-EM at crystallographic resolution. Ce// 161, 


450-457 (2015) 


. Kuhlbrandt, W. Cryo-EM enters a new era. eLife 3, e€03678 (2014). 
. Bevan, S., Quallo, T. & Andersson, D. A. Trpv1. Handb. Exp. Pharmacol. 222, 


207-245 (2014). 


. Julius, D. TRP channels and pain. Annu. Rev. Cell Dev. Biol. 29, 355-384 
(2013). 
. Cao, E., Liao, M., Cheng, Y. & Julius, D. TRPV1 structures in distinct 


conformations reveal activation mechanisms. Nature 504, 113-118 (2013). 


. Liao, M., Cao, E., Julius, D. & Cheng, Y. Structure of the TRPV1 ion channel 


determined by electron cryo-microscopy. Nature 504, 107-112 (2013). 


. Long, S. B., Tao, X., Campbell, E. B. & MacKinnon, R. Atomic structure of a 


voltage-dependent Kt channel in a lipid membrane-like environment. Nature 
450, 376-382 (2007). 


. Szallasi, A. & Blumberg, P. M. Resiniferatoxin, a phorbol-related diterpene, acts 


as an ultrapotent analog of capsaicin, the irritant constituent in red pepper. 
Neuroscience 30, 515-520 (1989). 


. Bohlen, C. J. et al. A bivalent tarantula toxin activates the capsaicin receptor, 


TRPV1, by targeting the outer pore domain. Cel/ 141, 834-845 (2010). 


. Chou, M. Z., Mtui, T., Gao, Y. D., Kohler, M. & Middleton, R. E. Resiniferatoxin 


binds to the capsaicin receptor (TRPV1) near the extracellular side of the S4 
transmembrane domain. Biochemistry 43, 2501-2511 (2004). 


. Gavva, N. R. et al. Molecular determinants of vanilloid sensitivity in TRPV1. 


J. Biol. Chem. 279, 20283-20295 (2004). 


. Hanson, S. M., Newstead, S., Swartz, K. J. & Sansom, M. S. P. Capsaicin 


interaction with TRPV1 channels in a lipid bilayer: molecular dynamics 
simulation. Biophys. J. 108, 1425-1434 (2015). 

Jordt, S. E. & Julius, D. Molecular basis for species-specific sensitivity to “hot” 
chili peppers. Ce// 108, 421-430 (2002). 

Phillips, E., Reeve, A., Bevan, S. & McIntyre, P. Identification of species-specific 
determinants of the action of the antagonist capsazepine and the agonist 
PPAHV on TRPV1. J. Biol. Chem. 279, 17165-17172 (2004). 


ARTICLE 


28. Yang, F. et al. Structural mechanism underlying capsaicin binding and 
activation of the TRPV1 ion channel. Nat. Chem. Biol. 11, 518-524 (2015). 

29. Bevan, S. et al. Capsazepine: a competitive antagonist of the sensory neurone 
excitant capsaicin. Br. J. Pharmacol. 107, 544-552 (1992). 

30. Boukalova, S., Marsakova, L., Teisinger, J. & Vilachova, V. Conserved residues 
within the putative S4-S5 region serve distinct functions among 
thermosensitive vanilloid transient receptor potential (TRPV) channels. J. Biol. 
Chem. 285, 41455-41462 (2010). 

31. Lee, S. Y. & MacKinnon, R. A membrane-access mechanism of ion channel 
inhibition by voltage sensor toxins from spider venom. Nature 430, 232-235 
(2004). 

32. Milescu, M. et al. Interactions between lipids and voltage sensor paddles 
detected with tarantula toxins. Nature Struct. Mol. Biol. 16, 1080-1085 
(2009). 

33. Milescu, M. et al. Tarantula toxins interact with voltage sensors within lipid 
membranes. J. Gen. Physiol. 130, 497-511 (2007). 

34. Bae, C. et al. Structural insights into the mechanism of activation of the TRPV1 
channel by a membrane-bound tarantula toxin. eLife 5, e11273 (2016). 

35. Hardie, R.C. TRP channels and lipids: from Drosophila to mammalian 
physiology. J. Physiol. 578, 9-24 (2007). 

36. Qin, F. Regulation of TRP ion channels by phosphatidylinositol-4,5- 
bisphosphate. Handb. Exp. Pharmacol. 179, 509-525 (2007). 

37. Rohacs, T. Phosphoinositide regulation of TRPV1 revisited. Pflugers Arch. 467, 
1851-1869 (2015). 

38. Cao, E., Cordero-Morales, J. F., Liu, B., Qin, F. & Julius, D. TRPV1 channels are 
intrinsically heat sensitive and negatively regulated by phosphoinositide lipids. 
Neuron 77, 667-679 (2013). 

39. Prescott, E. D. & Julius, D. A modular PIP2 binding site as a determinant of 
capsaicin receptor sensitivity. Science 300, 1284-1288 (2003). 

40. Ufret-Vincenty, C. A. et al. Mechanism for phosphoinositide selectivity and 
activation of TRPV1 ion channels. J. Gen. Physiol. 145, 431-442 (2015). 

41. Ufret-Vincenty, C. A., Klein, R. M., Hua, L., Angueyra, J. & Gordon, S. E. 
Localization of the PIP2 sensor of TRPV1 ion channels. J. Biol. Chem. 286, 
9688-9698 (2011). 

42. Hansen, S. B., Tao, X. & MacKinnon, R. Structural basis of PIP2 activation of the 
classical inward rectifier K+ channel Kir2.2. Nature 477, 495-498 (2011). 


Acknowledgements We thank our laboratory colleagues, past and present, for 
many helpful discussions and manuscript critiques, C. Paulsen and E. Green for 
helping with initial screening for nanodisc reconstitution, and E. Palovcak for 
providing scripts for focused classification. This work was supported by grants 
from the National Institutes of Health (ROINS047723, R37NS065071 and 
RO1NS055299 to D.J., S100D020054, RO1GM098672, PO1GM111126 and 
P50GM082250 to Y.C.). Y.C. is an Investigator with the Howard Hughes Medical 
Institute. 


Author Contributions Y.G. carried out protein purification, nanodisc 
reconstitution, and detailed cryo-EM experiments, including data acquisition, 
image processing, atomic model building and refinement of TRPV1—nanodisc 
complexes. E.C. carried out cryo-EM experiments of the TRPV1-capsazepine 
complex solubilized in amphipol. All authors contributed to experimental 
design, data analysis, and manuscript preparation. 


Author Information The three-dimensional cryo-EM density maps of the 
TRPV1-nanodisc complexes without low-pass filter and amplitude modification 
have been deposited in the Electron Microscopy Data Bank under accession 
numbers EMD-8118 (TRPV1-nanodisc), EMD-8117 (TRPV1-RTX/DkTx- 
nanodisc), EMD-8119 (TRPV1-capsazepine—-nanodisc) and EMD-8120 (TRPV1- 
capsazepine in amphipol). Particle image stacks after motion correction related 
to TRPV1-nanodisc and TRPV1-RTX/DkTx-nanodisc have been deposited 

in the Electron Microscopy Pilot Image Archive (http://www.ebi.ac.uk/pdbe/ 
emdb/empiar/) under accession number EMPIAR-10059. Atomic coordinates 
for the atomic model of TRPV1 in nanodisc, TRPV1-RTX/DkTx in nanodisc 

and TRPV1-capsazepine in nanodisc have been deposited in the Protein Data 
Bank under accession numbers 5IRZ, 5IRX and 5ISO. Reprints and permissions 
information is available at www.nature.com/reprints. The authors declare no 
competing financial interests. Readers are welcome to comment on the online 
version of the paper. Correspondence and requests for materials should be 
addressed to D.J. (david.julius@ucsf.edu) or Y.C. (ycheng@ucsf.edu). 


00 MONTH 2016 | VOL 000 | NATURE | 5 


© 2016 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


METHODS 


Protein expression, purification and nanodisc reconstitution. A minimal func- 
tional rat TRPV1 construct was expressed and purified as previously described’. 
Membrane scaffold proteins MSP2N2 and MSP1E3 were expressed and purified 
from Escherichia coli, and detergent-solubilized TRPV1 protein was incorporated 
into lipid nanodisc as previously described", with modifications. Briefly, 2.5 mg 
soybean polar lipid extract (Avanti) dissolved in chloroform was dried using argon 
stream and residual chloroform was further removed by vacuum desiccation 
(~3h). Lipids were then rehydrated in buffer (20 mM HEPES, 150mM NaCl, 
2mM TCEP, 14mM DDM, pH 7.4) and sonicated, resulting in a clear lipid stock 
at 10mM concentration. Purified MBP-TRPV1 protein (0.7-1.5 mg ml!) sol- 
ubilized in 0.5 mM DDM was mixed with the soybean lipid stock and MSP2N2 
(~3mg ml!) at various molar ratios and incubated on ice for 30 min. Specifically, 
we achieved the best result using the ratio TRPV1 monomer:MSP:soybean 
lipid = 1:1:150-1:1.5:225 for MSP2N2 and 1:1:100 for MSP1E3. Bio-beads SM2 
(20 mg per 1 ml mixture, Bio-Rad) were added to initiate the reconstitution by 
removing detergents from the system and the mixture was incubated at 4°C for 
1h with constant rotation. A second batch of Bio-beads (equal amount) together 
with TEV protease (40 1g per 1 mg TRPV1) was then added and the sample was 
incubated at 4°C overnight. Bio-beads were then removed and the reconstitution 
mixture cleared by centrifugation before subsequent separation on a Superose 
6 column (GE) in buffer (20 mM HEPES, 150mM NaCl, 2mM TCEP, pH 7.4). 
Reconstitution was assessed by size-exclusion chromatography, SDS-PAGE, and 
negative-stain EM (Extended Data Fig. 1). The peak corresponding to tetrameric 
TRPV1 reconstituted in lipid nanodisc was collected for analysis by both 
negative-stain and cryo-EM. TRPV1-nanodisc particles were mono-dispersed 
as assessed by negative-stain EM (Extended Data Fig. 1c). No statistical methods 
were used to predetermine sample size. The experiments were not randomized. 
The investigators were not blinded to allocation during experiments and outcome 
assessment. 

EM data acquisition and analysis. Grids of TRPV1-nanodisc complexes for 
negative-stain EM were prepared following an established protocol”. Specifically, 
2.5 ul of purified TRPV1-nanodisc complex (0.05-0.1 mg ml~!) was applied to 
glow-discharged EM grids covered by a thin layer of continuous carbon film 
and stained with 0.75% (w/v) uranyl formate. Negatively stained EM grids were 
imaged on a Tecnai T12 microscope (FEI Company) operated at 120kV. Images 
were recorded at a nominal magnification of x 52,000 and a defocus set to —1.5j1m 
using a 4k x 4k scintillator-based charge-coupled device camera (UltraScan 4000, 
Gatan), corresponding to a pixel size of 2.02 A on the specimen. 

For cryo-EM, 2.511 of purified TRPV1-nanodisc complex (~0.5 mg ml”! con- 
centration and supplied with 2.5% (v/v) glycerol) was applied to a glow-discharged 
Quantifoil grid (holey carbon film with 1.2-|1m hole size and 1.3-|1m hole spacing 
on 400-mesh Cu grid), blotted with a Vitrobot Mark III (FEI Company) using 8-s 
blotting time with 100% humidity at 5°C, and plunge frozen in liquid ethane cooled 
by liquid nitrogen. For preparation of TRPV 1-nanodisc in complex with agonists 
or antagonist, reconstituted channel complex was mixed with RTX (final concen- 
tration 50|1M; molecular weight 628 Da) and DkTx (final concentration 20 1M; 
molecular weight 8.5 kDa), or capsazepine (20|1M; molecular weight 377 Da), 
20 min before vitrification, as described earlier. 

Cryo-EM images of frozen hydrated TRPV 1-nanodisc particles were collected 
ona TF30 Polara electron microscope (FEI Company) equipped with a field emis- 
sion electron source and operated at 300 kV. Images were recorded at a nominal 
magnification of x 31,000 using a K2 Summit direct electron detector camera 
(Gatan) operated in super-resolution counting mode following an established 
protocol**. Images have a calibrated physical pixel size of 1.22 A per pixel on the 
specimen. The dose rate on the camera was set to be 8.2 counts (corresponding to 
9.9 electrons) per physical pixel per second. The total exposure time was 6s, leading 
to a total accumulated dose of 41 electrons per A’ on the specimen. Each image 
was fractionated into 30 subframes, each with an accumulation exposure time of 
0.2s. All dose-fractionated cryo-EM images were recorded using a semi-automated 
acquisition program UCSFImageé4 (ref. 45). Images were recorded with a defocus 
in a range from —0.7 to —2.2um. 

Image processing. Defocus of all images was determined using CTFFIND4 (ref. 46). 
Negative-stain EM images were 2 x 2 binned for particle picking and subsequent 
image processing. Sam Viewer, an interactive image analysis program written in 
Python, was used for all two-dimensional image display and manual particle pick- 
ing. Individual particles were manually picked, boxed out from the micrograph 
and normalized to have a mean of 0 and a standard deviation of 1. Usually a total 
of 1,000-2,000 particles were picked manually. For two-dimensional classification, 
particles were first corrected for contrast transfer function (CTF) by flipping the 
phase using ‘ctfapply’ (written by X. Li), and then subjected to ten cycles of corre- 
spondence analysis, k-means classification and multi-reference alignment, using 
SPIDER operations ‘CA S; ‘CL KM’ and ‘AP SH’ (ref. 47). Two-dimensional class 


averages generated from manually picked particles then served as references for 
a subsequent automatic particle picking procedure implemented in a Python script 
‘samautopick.py; as previously described’. All picked particles were then screened 
visually and particles without clear, defined structural features were removed inter- 
actively. The selected particles were again subjected to the same two-dimensional 
analysis and two-dimensional class averages were assessed (Extended Data Fig. 1). 

For cryo-EM images, dose-fractionated super-resolution image stacks of frozen 
hydrated TRPV1-nanodisc images were first binned 2 x 2 by Fourier cropping, 
resulting in a pixel size of 1.22 A, for motion correction and further image process- 
ing. Each image stack was subjected to whole-frame motion correction“, followed 
by correction at individual pixel level using the program UcsfDfCorr (written by S. 
Zheng). A sum of all corrected subframes, calculated following a dose weighting 
scheme'®, was used for further processing. Particle picking was performed similarly 
to as described earlier. Selected particles after visual screening were boxed out, and 
subjected directly to maximal-likelihood-based three-dimensional classification 
procedures implemented in RELION™. A previous density map of TRPV1 solubi- 
lized in amphipol (Electron Microscopy Data Bank accession 5778) was low-pass 
filtered to a resolution of 60 A and used as an initial reference for three-dimensional 
classification. Stable classes from three-dimensional classification were then itera- 
tively refined and reclassified to obtain the most homogeneous subset for the final 
three-dimensional reconstruction. All refinements followed the gold-standard 
refinement procedure, in which the data set was divided into two half sets, and 
refined independently. Once refinement was converged, the final data set was 
subjected to the ‘post-processing’ procedure of RELION, in which a soft mask 
was calculated and applied to the two half-maps before the corrected Fourier 
shell coefficient (FSC) was calculated. Temperature-factor estimation and map 
sharpening were also performed in this step using an automated procedure. 
C4 symmetry was applied in all three-dimensional classification and refinement steps 
unless specifically noted. The final resolution was estimated using the FSC = 0.143 
criterion*’ on corrected FSC curves in which the influences of the mask were 
removed. Local resolution was estimated from unbinned and unsharpened raw 
density maps using ResMap*!. The number of particles in each data set and other 
details related to data processing are summarized in Extended Data Table 1b. 
Conformations of TRPV1 alone or with ligands are very similar whether deter- 
mined in amphipol or nanodisc. 

For the TRPV1-RTX/DkTx nanodisc data set, two three-dimensional recon- 
structions were first determined independently to resolutions of 2.95 A with 
C4 symmetry and 3.24A with C2 symmetry. These two reconstructions are very 
similar. We then performed a three-dimensional classification focusing on DkTx 
and its peripheral region in TRPV1, following a procedure outlined in Extended 
Data Fig. 8a. Specifically, a volume that includes DkTx and peripheral densities 
in TRPV1 was masked out from the C2-symmetrized three-dimensional recon- 
struction. The density after masking was back-projected and convoluted with the 
CTF to yield a two-dimensional image for all individual particles using its assigned 
Euler angles and defocus parameters from the reconstruction. These images were 
first scaled and normalized to the corresponding experimental particle images 
and then subtracted from the experimental particle images, resulting in a particle 
stack in which every particle image contains only signals for the focused region. 
These procedures were implemented into a Python script ‘projection_subtraction. 
py (written by E. Palovcak) using the filt_ctfand math.sub.optimal functions from 
the SPARX and EMAN? libraries, respectively”? The modified particle images 
were then subjected to three-dimensional classification in RELION with a soft 
mask around DkTx, and without further alignment. Two major classes representing 
two possible orientations of DkTx (as judged by the linker region) were identified, 
and unsubtracted particles belonging to each class were separated and used for 
two independent reconstructions with pre-determined Euler angles. These two 
reconstructions were aligned to each other using ‘fit in map’ in UCSF Chimera™ 
and summed, yielding a density map with enhanced two-fold symmetry feature. 
This density map was used as the reference model for a second round of focused 
three-dimensional classification to further optimize the classification result. 

We also determined the structure of capsasepine-bound TRPV1 in amphipol 
A8-35 (Anatrace). In brief, TRPV1 (~0.5 mg ml!) in amphipol was mixed with 
capsazepine (final concentration 501M) at 4°C for ~30 min before application 
to grids. Procedures for grid preparation, data acquisition and image processing 
were the same as described!’. The final resolution of the reconstruction (3.8 A) 
was calculated using the ‘post-processing’ procedure of RELION, in which 
a soft mask was calculated and applied to the two half-maps using default 
parameters. 

Model building. Atomic models of TRPV1 in apo (Protein Data Bank (PDB) 
accession 3J5P) or fully open states (PDB accession 3J5Q), previously determined 
when the channel was solubilized in amphipol, were initially docked into maps of 
unliganded or agonist-bound TRPV1-nanodisc complex using UCSF Chimera. 
With improved resolution and stability afforded by the nanodisc system, we were 


© 2016 Macmillan Publishers Limited. All rights reserved 


able to remodel side chains and local geometry to higher accuracy. TRPV1 models 
were first adjusted and real-space refined using COOT™. Unliganded TRPV1 
model was then used for modelling capsazepine-bound structure with minor 
adjustment due to high similarity between the two structures. DkTx was remod- 
elled according to the improved map from focused analysis (see earlier). All models 
for ligands or associated lipids, except for RTX, were generated using elBOW”° 
module in PHENIX” together with their geometric constraints. RTX model and 
constraints were generated using a web server ‘PRODRG”™. For simplicity, all 
annular lipids in the structure were modelled as phosphatidylethanolamine (PE), 
and the acyl chains of all lipids were modelled as 1-8 carbon length according to 
specific densities. Models for all ligands were docked into densities and refined 
using COOT. Full models of TRPV1 (residue 335-751, corresponding to well- 
resolved regions in maps) in complex with ligands and lipids were then subjected 
to global refinement and minimization in real space using the module ‘phenix. 
real_space_refine’” in PHENIX. For cross-validation, the refined structures were 
first randomly displaced by 0.1 A and then refined against one of the half maps 
generated in RELION following the same procedures described earlier. FSC curves 
were calculated between the refined model and half map 1 (‘work, used in test 
refinement), the refined model and half map 2 (‘free’ not used in test refinement), 
and the refined model and summed map. The small gap between the work and 
the free FSC curves indicated little effect of over-fitting of atomic models. The 
geometries of all models were assessed using the ‘comprehensive model validation’ 
section in PHENIX and MolProbity®!, and detailed information was listed in 
Extended Data Table 1b. 

Figures were prepared using UCSF Chimera and two-dimensional electron 
microscopy images were extracted using Sam Viewer. 
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Extended Data Figure 1 | Reconstitution of TRPV1 into lipid nanodisc. TRPV 1-nanodisc sample showing mono-dispersed and homogeneous 
a, Size-exclusion chromatography of TRPV1 channel reconstituted into particles. d, Reference-free two-dimensional class averages of particles 
lipid nanodisc using MSP2N2. Void volume and peaks corresponding in c, revealing band-like density contributed by the lipid disc (side view) 
to TRPV1 and cleaved MBP are indicated. b, SDS-polyacrylamide gel and tetrameric arrangement of channel subunits (top view). e, Two- 
electrophoresis (SDS-PAGE) of detergent-solubilized MBP-TRPV1 fusion —_ dimensional class averages of the same protein reconstituted into MSP1E3 
protein and material from nanodisc reconstituted with TRPV1 following nanodisc, which is smaller in diameter. Note the extra space within the 
MBP cleavage (middle peak in a). Note the presence of both bands for disc offered by MSP2N2 scaffold protein in d. The size of the class average 
TRPV1 and MSP2N2. c, Representative micrograph of negative-stained window is 258 A. 
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Extended Data Figure 2 | Single-particle cryo-EM of unliganded 
TRPV1 in lipid nanodisc. a, Representative raw micrograph of apo 
TRPV1 in nanodisc. b, Fourier transform of image in a. Note that Thon 
rings are visible at up to 3 A. c, Gallery of two-dimensional class averages, 
with size of window as 233 A. d, Slices through the unsharpened density 
map at different levels along the channel symmetry axis (numbers start 
from extracellular side). e, Euler angle distribution of all particles included 
in the calculation of the final three-dimensional reconstruction. Position 
of each sphere (grey) relative to the density map (green) represents its 
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angle assignment and the radius of the sphere is proportional to the 
amount of particles in this specific orientation. f, Final three-dimensional 
density map coloured with local resolution in side and top views. 

g, Fourier shell coefficient (FSC) curves between two independently 
refined half maps before (blue) and after (red) the post-processing in 
RELION. h, FSC curves for cross-validation: model versus summed map 
(purple), model versus half map 1 (used in test refinement, cyan), model 
versus half map 2 (not used in test refinement, orange). Small differences 
between the ‘work and ‘free’ curves indicate little effect of over-fitting. 
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Extended Data Figure 3 | Single-particle cryo-EM studies of agonist- 
bound TRPV1 in lipid nanodisc. a, Representative raw micrograph of 
TRPV1-RTX/DkTx in nanodisc. b, Fourier transform of image in a. 

c, Gallery of two-dimensional class averages, with size of window as 233 A. 
d, Slices through the unsharpened density map at different levels along the 
channel symmetry axis (numbers start from extracellular side). e, Euler 
angle distribution of all particles included in the calculation of the final 
three-dimensional reconstruction. Position of each sphere (grey) relative 
to the density map (green) represents its angle assignment and the radius 
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of the sphere is proportional to the amount of particles in this specific 
orientation. f, Final three-dimensional density map coloured with local 
resolution in side and top views. g, FSC curves between two independently 
refined half maps before (blue) and after (red) the post-processing in 
RELION. h, FSC curves for cross-validation: model versus summed map 
(purple), model versus half map 1 (used in test refinement, cyan), model 
versus half map 2 (not used in test refinement, orange). Small differences 
between the ‘work and ‘free’ curves indicate little effect of over-fitting. 
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Extended Data Figure 4 | Single-particle cryo-EM studies of antagonist- _ the radius of the sphere is proportional to the amount of particles in this 
bound TRPV1 in lipid nanodisc. a, Representative raw micrograph of specific orientation. f, Final three-dimensional density map coloured 
TRPV 1-capsazepine complex in nanodisc. b, Fourier transform of image with local resolution in side and top views. g, FSC curves between two 
ina. c, Gallery of two-dimensional class averages, with size of window as independently refined half maps before (blue) and after (red) the post- 
233 A. d, Slices through the unsharpened density map at different levels processing in RELION. h, FSC curves for cross-validation: model versus 
along the channel symmetry axis (numbers start from extracellular side). summed map (purple), model versus half map 1 (used in test refinement, 
e, Euler angle distribution of all particles included in the calculation of cyan), model versus half map 2 (not used in test refinement, orange). 
the final three-dimensional reconstruction. Position of each sphere (grey) Small differences between the ‘work and ‘free’ curves indicate little effect 
relative to the density map (green) represents its angle assignment and of over-fitting. 
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Extended Data Figure 5 | Improved resolution for structures corresponding densities. Side-chain densities were considerably improved 
determined in nanodisc. Comparison of density maps (blue mesh) in nanodisc-stabilized TRPV1-DkTx/RTX structure (a, b), and notable 
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Refined atomic models (gold, nanodisc; grey, amphipol) are fit to 
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a 
Extended Data Figure 6 | Newly resolved TRPV1 cytoplasmic region region in a showing density map (blue mesh) and superimposed model 
in nanodisc-stabilized structure. a, A region in the TRPV1 C terminus, (gold). Previously resolved TRP domain and N-terminal 6-strands are 
previously unresolved in amphipol-stabilized structures (blue) is clearly depicted in ribbon diagram format (cyan). 


resolved in the nanodisc-stabilized structure. b, Enlarged view of boxed 
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Extended Data Figure 7 | Categories of lipid densities observed in 
TRPV1 structures. a, Two continuous layers of density (blue) contributed 
by lipid head groups of bilayer within nanodisc are shown for apo channel 
(left) and channel in complex with RTX-DkTx (right). b, Atomic model 
of annular lipids could be built into well-resolved densities (blue mesh) 
surrounding the channel protein. DkTx is shown as ribbon diagram 
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(pink). Top-down views show distribution of resolved annular lipids 
(blue) in inter-subunit crevices at the outer leaflet of the membrane. 

c, Well-resolved densities (blue mesh) in the structures representing a 
phosphatidylcholine molecule (left) and a phosphatidylinositol molecule 
(right). Transmembrane helices of TRPV1 close to the binding site are also 
shown as ribbon diagrams (grey). 
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Extended Data Figure 8 | Focused analysis of DkTx density map. a, Flow-chart showing procedures of focused three-dimensional classification of 
DkTx and proximal regions (see Methods for details). b, Atomic models for both knots of DkTx are superimposed on density maps (pink mesh). 
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Extended Data Figure 9 | Lipid co-factor and vanilloids at the vanilloid 
binding site of TRPV1. a, Chemical structure of phosphatidylinositol. 

b, Local environment of the phosphatidylinositol-binding site may 
accommodate multiple phosphatidylinositide species with phosphate 
substituents at the 3, 4 and/or 5 positions of the inositol ring (drawn in 
red). Adjacent regions of the channel are shown as ribbon diagram (grey). 
c, Tyr511 assumes two possible orientations that differ in apo versus 
agonist-bound states of the TRPV1 channel. In the apo state, one acyl 
chain of the resident phosphatidylinositol lipid (blue mesh superimposed 
with atomic model) prevents the Tyr511 side chain from assuming the 
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upward rotamer position. d, Density maps of vanilloids (resiniferatoxin, 
red mesh; capsazepine, gold mesh) superimposed with density of the 
bound phosphatidylinositol lipid (blue mesh), suggesting that they occupy 
overlapping, but not identical sites. Atomic models for both drugs and 
their chemical structures are also shown. e, Overlap of transmembrane 
region of one TRPV1 subunit corresponding to apo (blue) and RTX/ 
DkTx-bound (orange) states. Note the relatively small conformational 
change of the voltage sensor-like domain (S1-S4, boxed region). f, Overlap 
of transmembrane region of one TRPV1 subunit corresponding to apo 
(blue) and capsazepine-bound (gold) states. 
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7 Unliganded DkTx/RTX Capsazepine 
nanodisc amphipol nanodisc amphipol nanodisc amphipol 
Defocus range (um) -0.7 - -2.2 -1.5 - -3.0 -0.7 - -2.2 -1.5 - -3.0 -0.7 - -2.2 -0.8 - -2.2 
Number of images 1000 946 1200 ~1000 1219 1002 
Motion correction UcsfDfCorr MotionCorr UcsfDfCorr MotionCorr UcsfDfCorr MotionCorr 
Initial particle # 159193 97166 218787 148670 198831 81709 
Final particle # 30689 35645 73929 36158 80725 47477 
Resolution (A) 3.28 3.28 2.95 3.8 3.43 3.8 
unliganded DkTx/RTX capsazepine 
Data collection/processing 
Voltage (kV) 300 300 300 
Magnification 31000 31000 31000 
Defocus range (um) -0.7 - -2.2 -0.7 - -2.2 -0.7 - -2.2 
Pixel size (A) 1.2156 1.2156 1.2156 
Total electron dose (e/A’) 41 41 41 
Exposure time (s) 6 6 6 
Number of images 1000 1200 1219 
Number of frames per image 30 30 30 
Initial particle number 159193 218787 198831 
Final particle number 30689 73929 80725 
Resolution (unmasked, A) 3.53 3.24 3.88 
Resolution (masked, A) 3.28 2.95 3.43 
Refinement 
Number of atoms 12504 13162 11808 
Protein 11804 12558 11708 
Ligand 700 604 100 
R.m.s deviations 
Bond lengths (A) 0.0082 0.0127 0.0141 
Bond angles (°) 1.25 1.42 1.37 
Ramachandran 
Favored (%) 92.8 88.86 89.62 
Allowed (%) 12 11.02 10.12 
Outlier (%) 0 0.12 0.26 
Molprobity score 1.83 1.91 1.83 
a, Comparison of imaging/data-processing variables between nanodisc and amphipol datasets. b, Statistics of three-dimensional reconstruction and model refinement. 


LETTER 


doi:10.1038/nature17670 


Fission and reconfiguration of bilobate comets as 
revealed by 67P/Churyumov-Gerasimenko 


Masatoshi Hirabayashi!}, Daniel J. Scheeres!, Steven R. Chesley’, Simone Marchi?, Jay W. McMahon!, Jordan Steckloff*, 


Stefano Mottola’, Shantanu P. Naidu? & Timothy Bowling® 


The solid, central part of a comet—its nucleus—is subject to 
destructive processes’, which cause nuclei to split at a rate of 
about 0.01 per year per comet*. These destructive events are due to 
a range of possible thermophysical effects*; however, the geophysical 
expressions of these effects are unknown. Separately, over two-thirds 
of comet nuclei that have been imaged at high resolution show 
bilobate shapes’, including the nucleus of comet 67P/Churyumov- 
Gerasimenko (67P), visited by the Rosetta spacecraft. Analysis 
of the Rosetta observations suggests that 67P’s components were 
brought together at low speed after their separate formation®. Here, 
we study the structure and dynamics of 67P’s nucleus. We find that 
sublimation torques have caused the nucleus to spin up in the past 
to form the large cracks observed on its neck. However, the chaotic 
evolution of its spin state has so far forestalled its splitting, although 
it should eventually reach a rapid enough spin rate to do so. Once 
this occurs, the separated components will be unable to escape each 
other; they will orbit each other for a time, ultimately undergoing a 
low-speed merger that will result in a new bilobate configuration. 
The components of four other imaged bilobate nuclei have volume 
ratios that are consistent with a similar reconfiguration cycle, 
pointing to such cycles as a fundamental process in the evolution 
of short-period comet nuclei. It has been shown”* that comets were 
not strong contributors to the so-called late heavy bombardment 
about 4 billion years ago. The reconfiguration process suggested 
here would preferentially decimate comet nuclei during migration to 
the inner solar system, perhaps explaining this lack of a substantial 
cometary flux. 

Along the neck of 67P’s nucleus (the Hapi region) are two straight 
cracks, each a few hundred metres long and separated from each 
other by 750 metres (Fig. 1a, b)"°. The morphology of these cracks 
is distinguishable from that of other cracks observed on the surface 
of the nucleus (Extended Data Fig. 1). Previous studies suggested that 
the straight cracks have been created by tidal forces!”, but this would 
have required a close Jupiter flyby within a narrow range of distances. 
Episodes of past rapid rotation might also have caused stress frac- 
tures within the nucleus, and are feasible as the nucleus’ spin rate can 
undergo substantial changes per orbit, with its spin period decreasing 
by 0.36 hours to 12.4hours during its 2009 perihelion passage”'!. To 
analyse this possibility, we used elastic and plastic finite-element models 
(FEMs) to study the strength and failure of 67P at different spin peri- 
ods (see Methods). We used a reduced-resolution version of the SHAP 
2 shape model (see Methods)”, fixed the total mass at 1.0 x 101° kg 
(ref. 9), and assumed the material properties to be uniformly distrib- 
uted, resulting in a nuclear bulk density of 535 kg m~? (ref. 13). 

The elastic FEM analysis shows that, with faster spin rates, tensile 
stresses peak at the observed crack locations. Figure 1c, d shows the 
maximum principal stress distributions on the surface at spin periods 


of 12.4hours and 9 hours. For periods shorter than 9 hours, the peak 
stresses always appear at the crack locations, with the direction of the 
maximum principal stress moving away from and perpendicular to the 
crack planes (Fig. 1d)—implying that the cracks are of an open type 
(mode I)", consistent with their appearance”. 

The plastic FEM analyses identified three failure regimes for the 
nucleus (see Fig. 2 and Methods). Type I is compressive failure, which 
occurs at spin periods longer than about 9 hours. Compression would 
cause crushing failure around the neck’s surface, with the interior not 
at its yield stress. Type II is a crack-forming failure, which occurs at 
spin periods of around 7-9 hours, with tensional failure occurring on 
side A of the neck and portions of the interior near side B remaining 
in compression below yield stress. In type II failure, which occurs 


Maximum principal stress (Pa) 


Figure 1 | Locations of the straight cracks and stress peaks on the 
surface of the Hapi region of 67P. a, b, Two straight cracks, viewed from 
different angles, in the Hapi region?. Data are available at the European 
Space Agency (ESA) image browser (http://imagearchives.esac.esa.int), 
with identification numbers N20140808T062034578ID30F22 (a) and 
N201408261T074254573ID30F22 (b). Image credit: ESA/Rosetta/MPS 
for OSIRIS Team MPS/UPD/LAM/IAA/SSO/INTA/UPM/DASP/IDA; 
original images processed by ESA/Rosetta/SGS/PSA&ESDC to create 
image for Archive Imager Browser. c, Contour plot of the maximum 
component of the principal stress at a spin period of 12.4h. d, Contour 
plot of the maximum component of the principal stress at a spin period 
of 9h. The black arrows in d show the vector field of the stress component 
that is larger than 40 Pa. The white lines in ¢, d indicate the locations of 
the observed cracks. The spin axis is in the out-of-plane direction. 
ANSYS Academic APDL version 15.03 was used for an elastic FEM 
analysis to compute the maximum principal stress distributions, under 
the assumption that the nucleus initially had no cracks on the neck. 
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Figure 2 | Terminal failure states for the 67P nucleus. We used a plastic 
FEM analysis to compute terminal failure states. The colours show the 
ratio of the present stress state to the yield stress on the cross-section. 
Regions with a unity stress ratio are at yield and indicate failure’’. a, Type I 
failure is compressive surface failure, so there are no failed regions in the 
interior. The plot shows the case for a spin period of 12.4h and a cohesive 
strength of 1 Pa. b, Type II failure involves a tensile failure on side A, 

but structurally stable regions on side B. The figure shows an example 

at a spin period of 8h and a bulk cohesive strength of 30 Pa. c, Type III 
failure involves tensile failure across the neck region. The centrifugal 

force exceeds the gravitational force in this condition. This describes 

the case for a spin period of 5h and a bulk cohesive strength of 250 Pa. 

d, The orientation of the body. The maximum, intermediate and minimum 
moments of the inertia axes are shown with green, red and blue arrows, 
respectively. 


when spin periods are shorter than about 7 hours, centrifugal forces 
exceed gravitational forces, and the tensile failure region spreads across 
the neck. 

In type I failure, compressed materials should experience shear 
cracks, with additional splits occurring at the crack tips'*—features 
that are not observed at these cracks!!. Moreover, compressive failure 
does not occur unless the cohesive strength is less than about 10 Pa 
(Fig. 3); this is much smaller than the reported compressive strength for 
67P, of the order of kiloPascals'>’®. Type III failure also seems unlikely 
for this nucleus: given the existing cracks on the Hapi region, under a 
type III regime the concentration of stress in the interior should cause 
failure to propagate across the entire neck and separate the lobes (see 
Extended Data Fig. 2 and Methods). 

We therefore conclude that a type II failure resulted in the formation 
of the observed cracks on the Hapi region. Formation conditions for 
the cracks predict a bulk cohesive strength of about 10-200 Pa (Fig. 3), 
compatible with the reported cohesive strength of this nucleus'® and 
with that of other comet nuclei!”. Our results also predict that the pres- 
ent nucleus configuration will immediately undergo a type III failure 
and fission into two lobes if the spin period reaches about 7 hours. 

By computing the total energy of the fissioned system, we find that, 
at the 7-hour split limit, the system would have a negative total energy 
and be Hill stable, which means the distance between the two lobes 
would be bounded, preventing them from escaping one another'® 
(see Methods). Instead, they would enter a brief period of orbiting 
each other, eventually re-impacting at speeds less than escape speed 
(~1ms-—)), which should, given our strength constraints, preserve the 
bilobate structure of the nucleus!®. We confirmed that these results were 
insensitive to possible heterogeneity or variation in the bulk density 
(Extended Data Figs 3-6 and Methods). 

The results of our structural analysis raise the question of how the 
spin period could exceed 9 hours without transitioning beyond the 
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Figure 3 | Failure types and conditions at different spin periods. 

The thick and narrow black lines show results obtained from the elastic 
and plastic FEM analyses, respectively. If the cohesive strength is below the 
thick line, cracks should appear at the stress peaks. If cohesive strength is 
below the narrow line, the nucleus splits into two. Given that the cracks on 
67P have already formed, the lines delineate upper and lower bounds to the 
bulk cohesive strength at a given spin period. The boundary between type I 
failure and type II failure is determined by when stress peaks first appear 
in the neck. The boundary between types II and III is determined by when 
the gravitational and centrifugal forces are balanced across the neck. The 
type I compressive failure condition is less than ~10 Pa, and much lower 
than the reported compressive strength, ~1 kPa. In a type III failure, the 
existing cracks would experience tension in excess of its strength and fail 
catastrophically. Therefore, the shaded region indicates the bulk cohesive 
strength of the nucleus. Taking the lowest and highest values of this region, 
we derive strength limits of between ~10 Pa and ~200 Pa. We note that the 
findings shown here are relatively insensitive to variations in density and 
mass distribution (see Methods). The Hill stability condition is determined 
by finding when the total energy at fission is zero (see Methods). 


7-hour split limit for its components to separate during its present 
shape configuration. To analyse this, we applied results from an earlier 
study” that correlated nucleus spin variation with uniform gas emis- 
sion normal to the surface, appropriately scaled by the incident sun- 
light. This is analytically similar to the YORP effect, enabling these 
techniques to be applied to computing the spin acceleration for the 
given nucleus shape”'. Since sublimation pressure varies strongly with 
heliocentric distance, the spin acceleration of 67P will primarily occur 
near perihelion and will be a function of ¢., the subsolar latitude at 
perihelion in the nucleus frame (see Methods). Figure 4a and b give the 
scaled spin acceleration at perihelion as a function of ¢,, and as a func- 
tion of heliocentric distance. To assess the past evolution of the spin 
period, we integrated 1,000 clones backwards in time for 5,000 years, 
choosing initial conditions with uncertainties proportional to the pres- 
ent orbital uncertainty (see Methods). The timescale used is compatible 
with the activity lifetime of typical Jupiter family comets (JFCs)— 
thousands of years (ka)—and is much shorter than their dynamical 
lifetime of around 4 x 10?ka (ref. 22). 

Even relatively distant flybys of Jupiter will modify a comet's orbit, 
which in turn can yield large changes in the subsolar latitude at 
perihelion. Over multiple encounters, this yields a chaotically chang- 
ing outgassing torque and causes the subsolar latitude (Fig. 4c) and 
the perihelion distance (Fig. 4d) to become uniformly distributed 
between + 40° in just over 1 ka, and between 2 au and 5 au within 
0.5 ka, respectively. Variation in ¢, shows that the spin rate becomes 
completely random within the activity lifetime and makes it plausible 
for the nucleus to pass into and out of a spin period of 7-9 hours, 
forestalling spin-induced fission. Over a longer time span, this 
randomization also allows the nucleus to eventually exceed a 7-hour 
spin period and undergo fission and reconfiguration, also implying 
that the nucleus could have undergone repeated reconfigurations 
since its formation. Given that the orbit of 67P is typical among 
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Figure 4 | Dynamical variation of factors controlling spin acceleration. 
a, Profile of the relative spin acceleration as a function of the subsolar 
latitude at perihelion, ¢,. b, Intensity of sublimation (proportional 

to rotational acceleration) as a function of perihelion distance’. 

c, d, Cumulative distribution of ¢, and perihelion distance, respectively, 
obtained from 1,000 Monte Carlo simulations, starting with the present-day 


JECs, this proposed mechanism is common to the population as a 
whole. 

Tidal forces experienced during close Jupiter flybys could also be 
responsible for the formation of the observed cracks; however, with an 
elastic FEM analysis we find that 67P must pass within three radii of 
Jupiter to experience strong enough tidal forces. Our orbital simula- 
tions show that such close flybys are very unlikely to have occurred over 
the past 5 ka, with none occurring within our sample. Other processes 
such as thermal fatigue and sublimation are unlikely to have formed 
the observed cracks on the Hapi region, as the straight cracks are not 
polygonal—a typical morphology resulting from thermal fatigue 
(Extended Data Fig. 1a)—and are not eroded by sublimation?. Our 
plastic FEM analysis shows that the parallel set of cracks on the Hathor 
region”? (Extended Data Fig. 1b) cannot be due to rotational failure, 
because the reported compressive strength'>!° is much higher than the 
modelling-derived value that causes compressive failure (Fig. 3). This 
reinforces the idea that these fractures pre-date the merger of the lobes 
or are due to earlier reconfiguration cycles. 

Finally, bilobate nuclei observed by spacecraft encounters or ground- 
based radar have component volume ratios consistent with their nuclei 
being trapped in a similar cycle to that of 67P’s nucleus. For bilobate 
nuclei with a volume ratio between their lobes larger than about 0.2, the 
total energy of these systems will be negative after fission. This means 
that they are bounded in a similar way to some rubble pile asteroids™*”*, 
however additional sublimation effects could further erode or spin up 
the individual lobes before re-impact”®. Taking material density to be 
constant, we computed the volume ratios of the imaged bilobate nuclei 
of comets 1P/Halley, 8P/Tuttle, 19P/Borrelly, 67P and 100P/Hartley 2; 
we found that all of these nuclei had a volume ratio higher than 0.2 
(see Extended Data Fig. 7 and Methods). Observed nuclei with a single 
component might either be primordial, or have been part of a multi- 
component object, from which smaller parts are more easily shed'*””. 

The reconfiguration cycles of short-period cometary nuclei consti- 
tute a new evolutionary process that could affect their ability to survive 
during migration into the inner solar system. The evolution sequences 
proposed here might enhance erosion and reduce the lifetime of 


Perihelion distance (au) 


state and uncertainties and propagating backwards in time for 5 ka 

(see Methods). The y-axes show: a, rotational acceleration relative to 

an ideal value (present rotational acceleration is indicated by the star); 

b, sublimation intensity relative to an ideal value (again, the star indicates 
present scaling); c, d, cumulative distribution function of the latitude (c) 
or radius (d). 


cometary nuclei, potentially explaining the negligible cometary flux 
found during the late heavy bombardment”*”, 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Finite-element models. The meshes used here consist of ten-node tetrahedral ele- 
ments, generated by using the shape model made on the basis of OSIRIS imagery. 
The shape model used is a reduced-resolution version of the SHAP 2 model!?, 
which consists of 7,597 vertices and 15,190 faces. Using this polygonal model, we 
developed FEM mesh models by a Delaunay triangulation algorithm*! and then 
improved their quality by modifying the shape of each element. The mesh models 
finally obtained consist of a 105,006-node mesh and a 21,289-node mesh for plastic 
FEM and elastic FEM, respectively. 

We assumed the nucleus to be rotating uniformly in space and to be affected 
only by gravitational and centrifugal forces. We followed the technique of ref. 32, 
defining the boundary condition such that six degrees of freedom in node dis- 
placement were constrained to cancel out translational and rotational motion. To 
consider the tidal effect of Jupiter on the 67P nucleus, we also computed the tidal 
force components that can distort the structure of the nucleus, and applied the 
elastic FEM technique of ref. 33. Furthermore, we assumed isotropic deformation. 

We used an elastic FEM to determine where failure appears initially in the 
nucleus, as in ref. 33. We calculated elastic solutions for spin periods from 5h to 
12.4h, with a 0.5-h interval between simulations. When tensile peaks first appear 
on the surface at a given spin period, we stored the locations and the maximum 
components of the principal stress. 

We carried out a plastic FEM to determine the final failure types of the nucleus. 
We simulated the same spin period cases as in the elastic computation**. We mod- 
elled the yield condition of cometary materials by the Drucker-Prager criterion™: 


(oj) =ah+ fh —k=0 


where i and j are indices ranging from 1 to 3, f and J; are the stress invariants, and 
aandk are the parameters defined using a friction angle, 0, and cohesive strength, 
c, as follows: 


bis 2sin0 
J3(3—sin@) 
_ 6c cos 
V3 (3—sin@) 
Because the details of a flow law for cometary materials are unknown, we simplified 
our model by using an associated flow law?™: 
y 0a ij 


where éF is the plastic (‘p’) strain rate, v is the stress component, and dis a constant 
rate (the scale factor of the strain rate). 

We assumed an elastic/perfectly plastic flow law in the analysis. This choice 
guarantees that failure regions in the actual case are wider than those in our results. 
The plastic strain rate of brittle materials is faster than that of elastic/perfectly 
plastic materials**°. Early Rosetta results and observations suggest that materials 
in the 67P nucleus are highly brittle?” 

We fixed Young’s modulus, Poisson's ratio and the angle of internal friction at 
1.0 x 10” Pa, 0.25 and 35°, respectively**°?. These are typical values for geological 
materials, although the results obtained are not a strong function of them?*“”. 
Also, in our FEM analyses we assumed that the distribution and properties of the 
material are uniform in a given region and that the nucleus has no cracks initially. 
Identifying failure types. 

Boundary between types I and II. We determined the boundary between failure 
types I and II by searching for the spin periods at which tensile stress peaks appear 
on the surface of the neck. Because the elastic FEM simulations indicate that 
such peaks always show up first at the regions comparable to the locations of the 
observed cracks, we computed elastic solutions for the different spin period cases, 
and found the spin period condition at which the stress becomes tensile in the neck. 
Using the plastic FEM analysis, we identified that a type I failure is characterized by 
compression, while a type II failure includes tensional failure on side A. 
Boundary between types I and III, and crack propagation. A shorter spin period 
causes plastic deformations to progress across the nucleus neck. The presence of 
the straight cracks in the present configuration indicates that this failure mode has 
begun, but has not progressed across the entire region. 

Because the existing cracks cause stress to be concentrated at their tips, we also 
investigated the possibility of crack propagation resulting from tension. For mode I 
cracks, a crack-tip stress is expressed as a function of Kj/ ./2rr, where Ky isa mode I 
stress-intensity factor, and r is radial distance from the crack-tip’. This form indi- 
cates that a tensile loading always gives a non-zero growth rate“. Therefore, we 
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conclude that when centrifugal forces exceed gravitational forces in the neck, exist- 
ing cracks will propagate across the neck, and the body will split into two compo- 
nents. We defined this force balance point as the boundary between failure types 
and III. 

To demonstrate that this force balance condition leads to crack propagation 

across the neck, we use the 535kg m~° uniform density case as an example. The 
force balance condition was obtained at a 7-h rotation period. From the elastic 
solution for the neck cross-section, we confirmed that 90% of the region reaches 
tension (Extended Data Fig. 2). Although 10% of the cross-section is still com- 
pressive as a result of a bending moment, as the cracks propagate, the compressive 
region eventually disappears. 
Hill stability calculation. We analysed the initial conditions for a fissioned system 
to determine the Hill stability condition. Ifa system is Hill stable, then the orbital 
motions of the separated components are mutually bounded and cannot escape*®. 
For a system initiated from a cohesive fission, as with 67P, the generic fate is for the 
components to eventually collide with one another, albeit at low speeds that will 
be less than escape speed. We derive a sufficient condition for Hill stability, which 
is that the total energy (including translational and rotational kinetic energy plus 
mutual potential energy) is negative. If the spin period at the boundary between 
a type II and a type III failure satisfies the sufficient condition, then the system 
is Hill stable. 

To obtain the shortest spin period for the sufficient condition, we first sliced 
the nucleus through its neck to consider the head and the body to be separated 
components. Then, we numerically calculated the total energy of the system, which 
is given as: 


E- 1 MM) 


1 
= —-—— v-v+—w-(h+h)-w—-U 
2 M,+ M2 2 


where E is the total energy, U is the mutual potential, v is the relative velocity vector 
and equals the mutual spin rate times the distance between the mass centres, M is 
the mass of component 1 or 2, I is the inertia tensor for component 1 or 2, and w is 
the angular velocity vector, assumed to be oriented along the maximum moment of 
inertia of the 67P nucleus. Thus the spin period at which the total energy becomes 
zero can be solved for. 

Variation in the bulk density. The Rosetta-derived bulk density of 67P’s nucleus is 
reported to be 535+ 35kg m ° (ref. 13). We investigated how the uncertainty range 
in this measurement changes the results of our structural analysis. Assuming a 
constant material density, we computed the failure type and conditions for the cases 
of bulk densities of 500 kg m~* and 570kg m~ 3 (Extended Data Fig. 3). To obtain 
proper sizes of the nucleus for these cases, we fixed the total mass at 1.0 x 10'*kg. 
Because of this choice, the size for the 500kg m * bulk density case is 7% larger 
than the original size, while that for the 570 kg m° bulk density case is 6% smaller 
than the original size. The results showed that all the cases experience the same 
failure types—that is, types I, II and I1I—and that the spin period range of type II 
changes by only 0.5h. 

The hypothesis of a gentle contact also implies that the bulk density at the con- 
tact zone where the two lobes touch may differ from the bulk density of the indi- 
vidual lobes’. Fixing the size and mass as before, we tested two extreme cases 
that account for high and low bulk density of the neck. The first case considers 
that the nucleus has a 300-m-width neck with a zero bulk density. The second case 
includes the 300-m-width neck with a bulk density of 1,000 kg m_°. The elastic 
analysis identified two stress peaks on the spots comparable to those for the 
uniform density case (Extended Data Fig. 4). 

Considering the same spin period cases as Fig. 2—that is, 5h, 8h and 12.4h—we 
confirmed that these heterogeneous density cases also undergo the same failure 
types as in the homogeneous conditions (Extended Data Figs 5 and 6). To obtain 
plastic solutions for these cases, we determined critical cohesive strengths for dif- 
ferent spin periods (Extended Data Fig. 3). Because we fix the mass and volume, 
if the neck region has a low (high) density, the bulk density of the lobes becomes 
high (low). Such density conditions produce strong (weak) centrifugal forces and 
weak (strong) gravitational forces acting on the neck, because of the relation of 
the distance between mass elements*’. These different configurations cause the 
low-density neck and the high-density neck to have higher cohesive strength or 
lower cohesive strength, respectively, in order to keep the original shape. On the 
basis of this consideration, we chose the critical cohesive strengths for the 5-h and 
8-h cases. However, for the 12.4-h case the cohesive strength for failure is too small 
to follow such trends. 

Spin acceleration as a function of subsolar latitude. In the text, the spin acceler- 
ation is given as a function of the subsolar latitude (¢,) of the nucleus at perihelion. 
For each latitude, this profile is made by computing the insolation of each surface 
facet with respect to the Sun model (accounting for shadowing), summing the net 
torque from the normal forces on all lit facets of the SHAP 2 shape, and then 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


averaging it over one rotation”!”. The resulting curve (Fig. 4a) shows how the sign 
and strength of the rotational acceleration over one orbit change with the latitudi- 
nal location of the Sun. Combining this curve with the results of the Monte Carlo 
analysis of orbital evolution (see below) establishes that, over time, the spin accel- 
eration varies both positively and negatively in almost the same proportions. To 
avoid additional uncertainties in our analysis, we fixed the spin axis at right ascen- 
sion (RA) = 69.3° and declination (Dec.) = 64.1° (ref. 9). Even if the spin axis is 
constant in the inertial frame, @ , varies substantially owing to the orbital evolution. 
Monte Carlo analysis of orbital evolution. We propagated 1,000 orbital clones of 
67P to reveal the range of orbital evolution during the past 5 ka. The orbital solution 
(designated JPL K084/23) included six orbital elements and three non-gravitational 
acceleration parameters, namely Al, A2, and A3 (in the formulation of ref. 30). 
The data set for the orbital solution extends from 6 July 1988 to 17 July 2015, and 
includes 4,072 observations of the comet's right ascension and declination. To 
account for the potentially substantial long-term evolution in non-gravitational 
accelerations, we randomly drew clones from a Gaussian distribution ten times 
broader than that derived from the a posteriori nine-by-nine covariance matrix 
associated with the orbit estimate. Each clone was propagated 5 ka backwards 
in time using a dynamical model that incorporates the point mass gravitational 
effects of the Sun, eight planets, the Moon, and four massive minor planets 
(1 Ceres, 2 Pallas, 4 Vesta and 134340 Pluto). General-relativity perturbations from 
the Sun’s monopole were included, as were the non-gravitational accelerations on 
the comet according to the randomly drawn parameters (Al, A2, A3). For each 
clone, we recorded the osculating orbital elements at 100-day intervals and the 
close approach circumstances for all Jupiter encounters within 0.5 au. Note that 
earlier studies! did not include the detailed perturbation effects implemented in 
our model. 

Computation of the volume ratio. There are seven reported cometary nuclei 
whose shapes have been detected, and five out of them are considered to have bilo- 
bate shapes°. Assuming a constant material density, we computed the volume ratio 
of the smaller component to the larger one”! for each bilobate object (Extended 
Data Fig. 7) as a proxy for a volume ratio. 

For comets 67P and 103P/Hartley 2, we used the degraded SHAP 2 model? and 
the EPOXI-derived shape model", respectively. For each body, we searched for 
the smallest cross-section and split the body through it. Then, we calculated the 
volume of each component. For 1P/Halley*”’ and 19P/Borrelly”°, only a few images 
were available. After finding their neck regions, we cut the bodies through them. 
Then, assuming that the lengths in the minimum and intermediate principal axes 
are equal, we computed their volume ratios. 8P/Tuttle was observed by ground- 
based radar, and its shape was well determined*!; therefore, we referred directly 
to the reported values. 


Code availability. The input file to ANSYS for elastic computation for the 9-h case 
is available upon request from M.H. (thirabayashi@purdue.edu). The code used to 
compute the dependence of generated torque on the subsolar latitude is available 
upon request from J.W.M. (jay.mcmahon@colorado.edu). We have opted not to 
make the orbital evolution code available because it is proprietary and its release 
requires a licensing agreement. 
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Extended Data Figure 1 | Different types of crack observed on the 67P http://blogs.esa.int/rosetta/2015/08/18/do-comet-fractures-drive-surface- 


nucleus. a, Polygonal cracks on the Apis region at the edge of the large evolution/ (a) and at http://sci.esa.int/rosetta/55310-hapi-and-hathor/ (b). 
lobe; these cracks are presumably generated by thermal contraction”. Image credits: ESA/Rosetta/MPS for OSIRIS Team MPS/UPD/LAM/IAA/ 
b, Parallel sets of cracks on the Hathor region”>. Images available at SSO/INTA/UPM/DASP/IDA. 
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Cross section 


Spin axis 


-15 -10 5 0 
Maximum principal stress [Pa] 
Extended Data Figure 2 | Tensile regions on the neck cross-section at neck cross-section. The red region indicates tensile regions. b, Body 
the force balance point. The bulk density is assumed to be 535 kg m?, orientation. Green arrow, maximum moment of inertia axis; red arrow, 
a, The maximum component of the principal stress at the force balance intermediate moment of inertia axis; blue arrow, minimum moment of 
point, which is 7h. The region enclosed by the yellow line indicates the inertia axis. 
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Extended Data Figure 3 | Failure types and conditions for different bulk density cases. a, The bulk density is 500 kg m~*. b, The bulk density is 
570kgm~*. 
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Heterogeneity of bulk density 
Extended Data Figure 4 | Elastic analysis for heterogeneous density density of 578kg m~?. b, The bulk density of the neck is 1,000kg m~?, 
cases. The spin period was fixed at 9h. The contour plots show the leading to a bulk density of 498 kg m? in the other regions. c, Schematic 
maximum component of the principal stress, with units of pascals. plot of the bulk density distribution. 


a, The bulk density of the neck is zero. The other regions have a bulk 
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d 
Plot orientation 


a 
12.4 hours 


Stress ratio 


c 
5.0 hours 


Spin axis 
a, Type I failure at 12.4h. The cohesive strength used was 4 Pa. b, Type II 
failure at 8h. The cohesive strength was 45 Pa. c, Type III failure at 5h. 
The cohesive strength was 300 Pa. d, Body orientation. 


Extended Data Figure 5 | Plastic analysis for a neck density of 0kg m~°. 
The cross-sections displayed are the same as in Fig. 2. The averaged 
density is fixed at 535kg m~°. The colours describe the stress ratio. 
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Extended Data Figure 6 | Plastic analysis for a bulk density of ratio. a, Type I failure at 12.4h. The cohesive strength used was 5 Pa. 
1,000 kg m~ >. The cross-sections displayed are the same as in Fig. 2. The b, Type II failure at 8h. The cohesive strength was 25 Pa. c, Type III failure 


averaged density is fixed at 535 kg m °. The colours describe the stress at 5h. The cohesive strength was 200 Pa. d, Body orientation. 
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a b c 
67P/Churyumov-Gerasimenko 103P/Hartley 2 1P/Halley 
Volume ratio: 0.58 Volume ratio: 0.32 Volume ratio: 0.30 


d e 
19P/Borrelly 8P/Tuttle 
Volume ratio: 0.22 Volume ratio: 0.47 


Extended Data Figure 7 | Volume ratios of cometary nuclei imaged Data System. c, 1P/Halley. Image credit: ESA/MPS. d, 19P/Borrelly. Image 
from spacecraft encounters or ground-based radar. a, 67P/Churyumoy- _ credit: PIA03500, Courtesy by NASA/JPL-Caltech. e, 8P/Tuttle. Image 
Gerasimenko. Image credits: ESA/Rosetta/MPS for OSIRIS Team MPS/ credit: Arecibo Observatory scans 800300017-19; resolution 11s x 0.5 Hz 


UPD/LAM/IAA/SSO/INTA/UPM/DASP/IDA. b, 103P/Hartley 2. Image (see ref. 51 for more information). 
credit: EPOXI mission MRI-VIS frame 5004057 from NASA’ Planetary 
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Mean first-passage times of non- Markovian 
random walkers in confinement 


T. Guérin!, N. Levernier2, O. Bénichou2 & R. Voituriez?* 


The first-passage time, defined as the time a random walker takes 
to reach a target point in a confining domain, is a key quantity in 
the theory of stochastic processes’. Its importance comes from its 
crucial role in quantifying the efficiency of processes as varied as 
diffusion-limited reactions”, target search processes’ or the spread 
of diseases. Most methods of determining the properties of first- 
passage time in confined domains have been limited to Markovian 
(memoryless) processes*®”. However, as soon as the random walker 
interacts with its environment, memory effects cannot be neglected: 
that is, the future motion of the random walker does not depend 
only on its current position, but also on its past trajectory. Examples 
of non-Markovian dynamics include single-file diffusion in narrow 
channels®, or the motion of a tracer particle either attached to a 
polymeric chain’ or diffusing in simple’® or complex fluids such as 
nematics!', dense soft colloids!” or viscoelastic solutions!**!*. Here 
we introduce an analytical approach to calculate, in the limit of a 
large confining volume, the mean first-passage time of a Gaussian 
non-Markovian random walker to a target. The non-Markovian 
features of the dynamics are encompassed by determining the 
statistical properties of the fictitious trajectory that the random 
walker would follow after the first-passage event takes place, which 
are shown to govern the first-passage time kinetics. This analysis 
is applicable to a broad range of stochastic processes, which 
may be correlated at long times. Our theoretical predictions are 
confirmed by numerical simulations for several examples of non- 
Markovian processes, including the case of fractional Brownian 
motion in one and higher dimensions. These results reveal, on the 
basis of Gaussian processes, the importance of memory effects 
in first-passage statistics of non-Markovian random walkers in 
confinement. 

It has long been recognized that the kinetics of reactions is influ- 
enced by the properties of the transport process that brings reactants 
into contact!. Transport can even be the rate-limiting step, and in this 
diffusion-controlled regime, the reaction kinetics is quantified by the 
properties of the first encounter between molecules’. First-passage 
time (FPT) properties have been studied intensively in the past few 
decades!*!° and are now well understood when the stochastic motion 
of the reactants satisfies the Markov property, that is, is memoryless 
(uninfluenced by previous states, only by the current state). Under 
this assumption, exact asymptotic formulas characterizing the FPT 
of a tracer to a target located inside®”'® or at the boundary’ of a 
large confining volume have been obtained. These studies reveal that 
the geometrical parameters, as well as the complex properties of the 
stochastic transport process (such as subdiffusion), can have a strong 
impact on the reaction kinetics**”. 

However, as a general rule, the dynamics of a given reactant results 
from its interactions with its environment and cannot be described 
as a Markov process. Indeed, although the evolution of the set of 
all microscopic degrees of freedom of the system is Markovian, the 
dynamics restricted to the reactant only is not. This is typically the case 


for a tagged monomer, whose non-Markovian motion results from the 
structural dynamics of the whole chain to which it is attached?”!8, as 
observed for example, in proteins’®. Other experimentally observed 
examples of non-Markovian dynamics include the diffusion of tracers 
in crowded narrow channels® or in complex fluids such as nematics"! 
or viscoelastic solutions'*’4. Even in simple fluids, hydrodynamic 
memory effects and thus non-Markovian dynamics have been 
recently observed!°. So far, most theoretical results on the first- 
passage properties of non-Markovian processes have been limited to 
specific examples'”'®°-” or to unconfined systems, where non-trivial 
persistence exponents characterizing its long time decay have been 
calculated?*-*>. However, in many situations, geometric confinement 
has a key role in first-passage kinetics*>®”. Here, we develop a theo- 
retical framework with which to determine the mean FPT of non- 
Markovian random walkers in confinement. 

More precisely, we consider a non-Markovian Gaussian stochastic 
process x(t), defined in unconfined space, which represents the posi- 
tion of a random walker at time tf, starting from xp at t=0. As the 
process is non-Markovian, the FPT statistics in fact depend also on 
x(t) for t< 0. For the sake of simplicity, we assume that at t= 0 the 
process of constant average xo is in the stationary state (see 
Supplementary Information for more general initial conditions), with 
increments x(t-+ 7) — x(t) independent of t. The process x(t) is then 
entirely characterized by its mean square displacement (MSD): 
Wr) = ([x(t+7) —x(#)]?). Such a quantity is routinely measured in 
single particle tracking experiments and in fact includes all the mem- 
ory effects in the case of Gaussian processes. At long times, the MSD 
is assumed to diverge and thus, typically, the particle does not remain 
close to its initial position. Last, the process is continuous and non- 
smooth” ((x(f)?) =+ 00), meaning that the trajectory is irregular and 
of fractal type, similar to standard Brownian motion. Note that the 
class of random walks that we consider here covers a broad spectrum 
of non-Markovian processes used in physics, and in particular the 
examples mentioned above. 

The random walker is now confined in a domain of volume V with 
reflecting walls, and we focus on its mean FPT to reach a target of 
position x =0 (see Fig. 1). Note that this setting also gives access to 
the reaction kinetics of a reactant in the presence of a concentration 
c=1/V of targets in infinite space. Although the theory can be devel- 
oped in any space dimension (see Supplementary Information for 
an explicit treatment of the two-dimensional and three-dimensional 
cases), it is presented here for clarity in one dimension (see Fig. 1b). 
Our starting point is the following generalization of the renewal 
equation! 


p(0,t) = f “dvF(r)p(0.t |FPT =r) (1) 


which results from a partition over the first-passage event. In this equa- 
tion, p(0, t) stands for the probability density of being at position x=0 
at time t, Fis the FPT density and p(0, t)FPT = 7) is the probability that 


lLaboratoire Ondes et Matiére d’Aquitaine, University of Bordeaux, Unité Mixte de Recherche 5798, CNRS, F-33400 Talence, France. @Laboratoire de Physique Théorique de la Matiére Condensée, 
CNRS/Université Pierre et Marie Curie, 4 Place Jussieu, 75005 Paris, France. Laboratoire Jean Perrin, CNRS/Université Pierre et Marie Curie, 4 Place Jussieu, 75005 Paris, France. 


356 | NATURE | VOL 534 | 16 JUNE 2016 


© 2016 Macmillan Publishers Limited. All rights reserved 


b A x(t) 
we ZLZLLL LL LLLLL LS 
p(t- FPT) 
Xo 
Target 
— FPT Time. t 


Figure 1 | Mean FPT of a random walker in confinement. a, What is 

the mean time (T) needed for a random walker starting at xo (blue dot) 

to reach a target (red dot) in a confining volume V? Here we answer this 
question for random walkers with memory. b, In one dimension, the 
problem is to quantify the FPT of a random trajectory (in blue) in the 
presence of a reflecting boundary. We show here that (T) is controlled by 
the average trajectory ,(r) (in red) followed by the walker in the future of 
its first passage to the target. 


x=Oat time t given that the first-passage event occurred at time 7. Owing 
to the confinement, for large times p(0, t) reaches the stationary value 1/V. 
Next, subtracting 1/V on both sides of equation (1) and integrating 
over t from 0 to infinity yields an exact expression for the mean FPT: 


{= [* atta,(¢) — poe) (2) 


where q,(t)dx is the probability of observing the random walker in the 
interval [0,dx] at time t after the first passage to the target. The exact 
formula (2) is a generalization of the expression obtained for Markovian 


processes®”* and holds for any non-smooth non-Markovian process 
a Y(t) = Do(1 - e*) + Dt b FBM, H = 0.4 
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ine: —s— Simulations / 
Simulations: Non-Markovian 
5 Volume, V 0.08} ----- Markovian / 
10°F 6 40 
= 60 
= 120 = 
© 10° i & 
40-1 Theory: 
—— Non-Markovian 
ey Markovian 
10° 10! 
Xo 
e f 


n(t\/V 


Xo 


2.08 
4.73 
— 10.7 


1071 


10° 102 10-5 10° 
t t 


Figure 2 | Mean FPT of one-dimensional non-Markovian random walks. 
Mean FPT as a function of the initial position xp (a-d) and average reactive 
trajectory ,(t) in the future of the FPT as a function of time t (e-h) for 
various one-dimensional Gaussian stochastic processes. Solid lines are 
predictions of the non-Markovian theory from equations (3) and (4); 
dashed lines are the Markovian approximation (in which ju(t) = 0); and 
symbols represent numerical simulations using the circulant matrix 
algorithm (see Supplementary Information). In a and e, the correlator W(t) 
is indicated with D= 1, D) = 30, \=1 (arbitrary units). Time is in units of 
1/\ and lengths are in units of (D/\)'”. In e symbols represent different 
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with stationary increments (even non-Gaussian). Even if q;(t) isa priori 
a non-trivial quantity because it is conditioned by first-passage events, 
this equation is of great practical use in determining the mean FPT, as 
shown below. 

To proceed further, we first consider the large volume limit V— oo 
(where it is assumed that all boundary points are sent to infinity) and, 
second, we assume that the stochastic process in the future of the FPT, 
defined by y(t) =x(t+ FTP), is Gaussian with mean ,i(t) and the same 
covariance as the initial process x(t) (see Fig. 1b). Simulations and the 
perturbation theory below show the broad validity of this approach. 
Equation (2) then leads to: 


(T)= 


oo p(t)? /2aK(t) _ p—x5/20(t) 
¥ f — (3) 
0 


[2my(t)]}/? 


Relying on a generalization of equation (1) to link the n times proba- 
bility distribution functions of x(t), x(t), ... and the FPT density, we 
obtain an equation for the probability of the future trajectories y(t) 
leading to (see Supplementary Information for details): 


{[yu(t+-7)—p(t)K(t, 7) eH? 24) xe 1K (t, 7) Je *0/ 24}, = 0 
(4) 


where (0) =0 and K(t,7) = [W(t+7) + H(t) — W(7)]/[2Y.(0)]. Equation 
(4), which allows for a self-consistent determination of the mean future 
trajectory ju(t), together with equation (3), provide the mean FPT and 
constitute our main result. 

At this stage, several remarks can be made. (1) The mean FPT 
depends linearly on the confining volume V, which extends the result 
obtained for Markovian processes®. (2) Our approach reveals the key 
role of the mean trajectory s(t) followed by the walker in the future 
of the first-passage event. In other words, even if the real motion is 
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volumes (hexagons, V = 40; squares, V= 60; and diamonds, V= 120); the 
superposition confirms that j(t) does not depend on V. In b-d and 

f-h, fractional Brownian motion (FBM) is shown for K= 1 (arbitrary 
units). Time is in units of V'/"/K!/?"_ Note that the theory is derived for the 
limit of large volume, or equivalently x) < V. When significant, error bars 
give the s.e.m. of the numerical simulations. Number n of simulated 
trajectories: in a and e n= 173,285 (for V=40), n = 180,641 (for V=60), 
and n= 96,623 (for V= 120); in b and f n= 19,224; in c and g n= 22,422; 
and in d and h n= 40,685. 
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Figure 3 | Mean FPT of two- and three-dimensional non-Markovian dimensions is indicated for D= 1, Dp = 30, V=100, A= 1 (arbitrary units). 


random walks. Mean FPT to a target of radius a= 1 (arbitrary units) as a 
function of the initial position ro (a, c, e) and average reactive trajectory 
u(t) in the future of the FPT as a function of time t (b, d, f) for different 
two-dimensional (2D) (a-d) and three-dimensional (3D) (e, f) Gaussian 
stochastic processes. Solid lines are predictions of the non-Markovian 
theory from equations (3) and (4); dashed lines are the Markovian 
approximation, in which ju(t) remains equal to the radius a= 1 of the 
target; and symbols represent numerical simulations using the circulant 
matrix algorithm. In a and b, the correlator Y(t) of each coordinate in two 


stopped at the first encounter with the target, the mean FPT is con- 
trolled by the statistical properties of the fictitious path that the walker 
would follow if allowed to continue after the first encounter event. (3) 
Assuming that a(t) « 7 at large times, with 0 < H< 1, it can be shown 
from the asymptotic analysis of equation (4) that: 


f(t) = xo— A £741 (t+ 00) (5) 


where A is a coefficient depending on the entire MSD function ~(f) (at 
all timescales) and on x (it generally has the same sign as xo). Thus, for 
processes that are subdiffusive at long times (so that the MSD grows 
slower than linearly with time, H < 1/2), ju(t) comes back to the initial 
position x» of the walker, which is consequently not forgotten. On the 
contrary, asymptotically superdiffusive walkers (H > 1/2) keep going 
away from the target in the future of the FPT with a non-trivial expo- 
nent. These behaviours reflect the anticorrelation and correlation of 
successive steps of subdiffusive and superdiffusive walks, respectively. 
Note that even for asymptotically diffusive processes (H = 1/2), ju(t) 
tends to a non-vanishing constant, in contrast to a pure (Markovian) 
Brownian motion. (4) The importance of non-Markovian effects can 
be appreciated by comparing the mean FPT to the result obtained by 
setting ;1(t) =0, which amounts to neglecting the memory of the trajec- 
tory before the first passage. As shown by equation (5), y(t) is actually 
not small, so that memory effects are important. They are especially 
marked for H < 1/3, where setting j(t) =0 in equation (3) leads to an 
infinite mean FPT, as opposed to our finite non-Markovian prediction. 

We now confirm the validity of these analytical results by com- 
paring them to numerical simulations of representative examples of 
non-Markovian processes defined by the MSD 7)(t). First, the choice 
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Time is in units of 1/A and lengths are in units of a. In c and d, fractional 
Brownian motion in two dimensions is shown, with K = 1, V= 607 
(arbitrary units). Time is in units of a!/"/K"/?" and lengths in units of a. 

In eand f, the correlator W(t) of each coordinate in three dimensions is 
indicated for D= 1, Dp = 10, \=1 (arbitrary units). Time is in units of 1/A 
and lengths in units of a. The confining volume is a sphere of radius R= 70 
or a cube of volume V= 116°. When significant, error bars give the s.e.m. 
of the numerical simulations. Number n of simulated trajectories: in a and b 
n= 35,334, in cand d n = 37,314; and in e and f n= 16,900. 


u(t) = Do(1 —e-™) + Dt= Wp (8) corresponds to the generic case where 
the position x(t) is coupled to other degrees of freedom at the single 
timescale 1/A (Fig. 2a, e). It is typically relevant to tracers moving in 
nematics"! or solutions of non-adsorbing polymers”’. 

Second, the choice 7)(t) = Kt?" where 0 < H< 1 and Kis a positive 
transport coefficient (Fig. 2b-d, f-h), corresponds to the fractional 
Brownian motion used in fields as varied as hydrology”’, finance”? and 
biophysics'*”*; it is a particularly good description of anomalous diffu- 
sion in various physical situations such as telomere motion™ or tracer 
diffusion in viscoelastic fluids’*. This process is strongly non-Marko- 
vian, as shown by its long-range correlation functions. For fractional 


Brownian motion, the solution of equation (4) is of the form: 
H(t) =o fiy(t KYPH/axg/") 
so that the mean FPT reads: 
(T)=V By xg KH (6) 


with @}, a numerical coefficient given in Supplementary Information. 
This equation gives the explicit dependence of the mean FPT on xo and 
generalizes the results obtained for Markovian processes®, 

Third, the theory can be extended to higher dimensions with the 
supplementary assumption that the random walk is isotropic. Two- 
dimensional and three-dimensional versions of both of the choices of 
w(t) considered above have been analysed explicitly (Fig. 3). 

In fact, as shown in the Supplementary Information, the theory 
is exact at order c? when one considers a MSD function of the type 
W(t) =Dt+ey +7. +... where the small parameter ¢ measures the 
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deviation from a Markovian process (see Supplementary Information). 
Figures 2 and 3 reveal very good quantitative agreement between the 
analytical predictions and the numerical simulations far beyond this 
perturbative regime. Both the volume and the source-target distance 
dependence of the mean FPT are unambiguously captured by the 
theoretical analysis, at all the length scales involved in the problem. 
Note that, even if the theoretical prediction relies on large-volume 
asymptotics, numerical simulations show that it is accurate even for 
small confining systems (with various shapes of confining volumes, 
such as spherical or cubic). The very different nature of these examples 
(one, two or three dimensions, diffusive, superdiffusive or subdiffusive 
at long times...) demonstrates the wide range of applicability of our 
approach. Remarkably, the amplitude of memory effects is important 
in the examples shown in Figs 2 and 3, where the multiplicative factor 
between Markovian and non-Markovian estimates of the mean FPT can 
be up to 15 (Fig. 2c). As discussed above, this factor is even infinite for 
the fractional Brownian motion as soon as H < 1/3. Interestingly, even 
for the process defined by 7(t) =yp(t) above, which is diffusive both at 
short and long times, for which one could thus expect memory effects 
to be negligible, this factor is not small (typically 5; see Fig. 2a). The 
accuracy of our analytical predictions for the mean FPT traces back to 
the quantitative prediction for the trajectories in the future of the FPT 
ju(t), as shown in Figs 2 and 3. The strong dependence of j:(f) on the 
starting point xo, predicted by our approach and confirmed numerically, 
is a direct manifestation of the non-Markovian feature of the random 
walks. Together, our results demonstrate and quantify the importance 
of memory effects in the first-passage properties of non-Markovian 
random walks in confined geometry. 
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Intrinsic ferroelectric switching from first principles 


Shi Liu', Ilya Grinberg?? & Andrew M. Rappe? 


The existence of domain walls, which separate regions of different 
polarization, can influence the dielectric!, piezoelectric’, 
pyroelectric’ and electronic properties*” of ferroelectric materials. 
In particular, domain-wall motion is crucial for polarization 
switching, which is characterized by the hysteresis loop that is a 
signature feature of ferroelectric materials®. Experimentally, the 
observed dynamics of polarization switching and domain-wall 
motion are usually explained as the behaviour of an elastic interface 
pinned by a random potential that is generated by defects””*, which 
appear to be strongly sample-dependent and affected by various 
elastic, microstructural and other extrinsic effects” '*. Theoretically, 
connecting the zero-kelvin, first-principles-based, microscopic 
quantities of a sample with finite-temperature, macroscopic 
properties such as the coercive field is critical for material design 
and device performance; and the lack of such a connection has 
prevented the use of techniques based on ab initio calculations 
for high-throughput computational materials discovery. Here 
we use molecular dynamics simulations’’ of 90° domain walls 
(separating domains with orthogonal polarization directions) in 
the ferroelectric material PbTiO; to provide microscopic insights 
that enable the construction of a simple, universal, nucleation- 
and-growth-based analytical model that quantifies the dynamics 
of many types of domain walls in various ferroelectrics. We then 
predict the temperature and frequency dependence of hysteresis 
loops and coercive fields at finite temperatures from first 
principles. We find that, even in the absence of defects, the intrinsic 
temperature and field dependence of the domain-wall velocity can 
be described with a nonlinear creep-like region and a depinning-like 
region. Our model enables quantitative estimation of coercive fields, 
which agree well with experimental results for ceramics and thin 
films. This agreement between model and experiment suggests that, 
despite the complexity of ferroelectric materials, typical ferroelectric 
switching is largely governed by a simple, universal mechanism of 
intrinsic domain-wall motion, providing an efficient framework for 
predicting and optimizing the properties of ferroelectric materials. 

In ferroelectric materials, domain walls separate regions with differ- 
ent polarization orientations. In response to an external perturbation 
that favours one polarization state over another, the domain wall will 
move to increase the size of the domain favoured by the perturbation, 
potentially leading to polarization switching of the whole material. 
The translational motion of the 180° domain wall has been studied 
experimentally”!”!*4 and theoretically!>'8. The dynamical behaviour 
of a domain wall is usually understood as an elastic interface moving 
in a fluctuating pinning potential that is created by defects”*. Under 
relatively weak electric fields (E), the propagation of domain walls at 


finite temperature (T) can be described with a creep process”"”: 
Lt 
vocexp — | Fee) (1) 
kgT\ E 


where v is the domain-wall velocity, U is a characteristic energy barrier, 
kg is Boltzmann's constant, Eco is a critical field at which depinning 
occurs at 0 K and 1 is the dynamical exponent determined by the nature 


of the defects. The dynamical exponent j= 1 is usually ascribed to the 
random field defects, which break the symmetry of the ferroelectric 
double-well potential”!”, whereas /1 = 0.5 is an indication of random 
bond disorder, which locally modifies the symmetric ferroelectric 
double-well potential depth'!. Another widely used equation that 
characterizes the switching and domain-wall motion is Merz’s law, 
which takes the form v= voexp(—E,/E), where vo is the domain-wall 
velocity under an infinite field and E, is the temperature-dependent 
activation field'*>. Merz’s law can be viewed as a reformulation of 
equation (1) with p= 1 and E, = UEco/(kgT). When the electric field 
becomes larger than the crossing field Eco, the wall experiences a 
pinning-depinning transition"®, with the velocity becoming temperature- 
independent and given by: 


voc (E— Eco)? (2) 


where @ is a velocity exponent that reflects the dimensionality (D) of the 
wall. A classical theory based on a nucleation-and-growth mechanism 
was developed by Miller and Weinreich!® to explain the intrinsic ori- 
gin of Merz’s law and creep behaviour. However, the Miller-Weinreich 
model assumes the dominant role of depolarization energy during 
nucleation, which incorrectly leads to an atomically sharp triangular 
critical nucleus and implausibly high activation fields for nucleation”. 
Multiscale simulations for 180° domain walls in defect-free PbTiO3 
revealed a square critical nucleus with diffusive and bevelled interfaces 
that substantially reduces the nucleation barrier and hence leads to 
much lower activation fields for domain-wall motion, suggesting an 
intrinsic origin for ps=1 (ref. 17). 

Unlike the motion of 180° domain walls, switching processes in 
ceramics, thin films and single-crystal ferroelectrics are not well under- 
stood. The presence of a variety of extrinsic features, the possible role of 
ferroelastic effects in non-180° switching and the long (microsecond- 
millisecond) timescales typically studied for switching make it 
challenging to relate the observed hysteresis loops to the microscopic 
properties of ferroelectric materials. Because of the strong clamping 
effect of the substrate!?”°, the intrinsic dynamics of non-180° domain 
walls cannot be studied in high-quality ferroelectric thin films; instead, 
most recent experimental and theoretical studies of non-180° domain 
walls have focused on static properties!’. Here, we use a multiscale 
approach to computationally model the switching process. We first 
obtain the missing quantitative understanding of the intrinsic dynamics 
of non-180° domain walls and encapsulate it in a simple and general 
model for domain-wall speed. The model is then used in coarse-grained 
simulations on long timescales that enable accurate calculation of 
ferroelectric-switching hysteresis loops and coercive fields. 

We quantitatively estimate the velocity of a 90° domain wall in defect- 
free PbTiO; over a wide range of temperatures and electric fields using 
large-scale molecular dynamics simulations (see Methods). Figure 1 
presents the velocity as a function of applied electric field for various 
temperatures, revealing an intrinsic ‘creep—depinning’ transition. In 
the low-field region (E<0.5 MVcm_~'), the velocity strongly depends 
on temperature and has a strong nonlinear dependence on the electric 
field. In the high-field region (E > 0.5 MV cm7'), the temperature 
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Figure 1 | Domain-wall velocity 
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+ velocity data at 40 K are in the flow 
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to equation (2). We find 0=0.72, 
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area). The solid lines are guides for 
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dependence of the domain-wall velocity becomes weaker, as seen by 
the overlap of the velocity data obtained at different temperatures. 
Plotting In(v) versus 1/E (Fig. 1b), we find that In(v) has a linear 
relationship with 1/E in the low-field region. This confirms that for 
relatively low electric fields and high temperatures the velocity of the 
90° domain wall follows Merz’s law (j1= 1.0), showing a creep-like 
response even in the absence of defects. The inset in Fig. 1b shows 
the temperature dependence of the activation field E, = UEco/(kpT) 
above 140K. The nearly linear relationship between E, and 1/T shows 
that UEco/kg is temperature-independent in the creep-like region 
with a value of 283K MV cm’. By fitting the velocity data at 40K 
with equation (2), we find that 9=0.72 and Ecy =0.48 MV cm~!. The 
crossing field for the 90° domain wall is lower than that for the 180° 
domain wall (1 MV cm7!) in Pb(Zr, Ti)O3 (PZT) thin films!; this 
is expected, because ab initio calculations have shown that the 90° 
domain wall in PbTiO; is lower in energy than the 180° domain wall 
in PZT!®. The values of the dynamical exponent are the same (j: = 1) 
for 90° and 180° domain walls'’. This indicates a universal intrinsic 
response for ferroelectric domain walls under low driving force. The 
observed intrinsic creep—depinning transition can be explained with 
a nucleation-and-growth mechanism. At low fields, the large size of 
the critical nucleus and the high nucleation barrier relative to ther- 
mal fluctuations make nucleation the rate-limiting step and lead to an 
Arrhenius dependence of the velocity in the creep region. At high fields, 
the nucleus size and nucleation barrier approach zero and the domain- 
wall velocity is growth dominated, resulting in near-linear dependence 
on electric fields and a weak temperature dependence. 

We now develop an analytical model for nucleation at a non-180° 
domain wall based on our molecular dynamics simulations for 90° 
domain walls. As shown in Fig. 2a, a 90° domain wall in x-y coordinates 
can be viewed as a special 180° domain wall in X-Y coordinates: the 
polarization component parallel to the domain wall (Py) is reversed by 
180° across the boundary, while the polarization component perpen- 
dicular to the domain wall (Px) remains almost unchanged (bottom 
of Fig. 2a). This transformation allows us to treat all types of non-180° 
domain walls as a 180° domain wall and allows a convenient estimate 
of the relative energies of different types of domain walls based on the 
Landau-Ginzburg-Devonshire (LGD) expression for the energy per 
unit area (c) of the 180° domain wall (cg9pw). Detailed examinations 
of nucleation events at the domain wall (X = 0) at low temperature 
(T=20K) reveal a diamond-like nucleus in the Y—-Z plane (Fig. 2b), 
with substantial diffuseness at the boundary characterized by a gradual 
polarization change. With this microscopic picture of nucleation, we 
use LGD theory to relate the nucleation energy to the fundamental 
characteristics of the material (see Methods). The nucleation energy 


10 15 20 2 30 35 
1/E (em MV-") 


region at low fields. 


Unuc includes two important energy terms: polarization-electric-field 
coupling (PE) and interfacial energy. Contrary to the assumption of the 
classical Miller-Weinreich model, the depolarization energy is quite 
small and does not make a substantial contribution to the nucleation 
energy (see Methods for a detailed analysis of elastic and depolarization 
energy). 

At the lowest approximation, Px and Pz remain unchanged across 
the domain wall and, therefore, the nucleation energy depends only 
on Py. The profile of Py for a domain wall containing a nucleus of size 
1, x lh x 1; can be described as: 


(Xb Sf(P+Z, Jb ba)f(Y—Z, J2b, 63) + 72g.) (3) 


J2 
x+1/2 x—1/2 = x—1/2 
tanh( 77 ) tanh( 70 )}> (1.6) = tanh ie )> 


2B 
=f 


Py= 


where f(x, 1,6) => 


P, is the bulk polarization and 6; characterizes the diffuseness of the 
nucleus along direction i. Figure 2c shows the polarization profile in 
the Y-Z and X-Y planes generated by equation (3). Evaluating this Py 
profile in the LGD energy expression for different parameter values 
(J, and 1) allow us to identify the critical nucleus size and to estimate 
the nucleation activation energy (AUnu,). According to Avrami theory 
of transformation kinetics, AU, can be related to the activation field 
. , 1 AUnuc 
in Merz’s law as E, ~ aaa ae 
applying this relation with D =2 and using parameters (see Methods) 
obtained from our classical bond-valence potential, we obtain E, values 
for a range of temperatures. As shown in Fig. 2d, the activation fields 
predicted from the analytical model agree well with molecular dynamics 
results. To apply the model to other types of non-180° domain walls, 
only a simple modification of the input parameters is required, with the 
necessary values obtained from first-principles density functional theory 
(DFT) calculations of the particular domain wall (see Methods). 

The availability of an analytical model that uses DFT inputs enables 
rapid estimation of hysteresis loops and coercive fields (E,; see 
Methods). Because the structure and polarization of Ti-rich PZT 
are similar to those of PbTiO3, we compare the simulated values of 
the PbTiO; E, to various experimental values for PZT materials. We 
find that our theoretical coercive fields (Fig. 3a) using parameters of 
90°-domain-wall motion agree well over a large frequency range with 
the experimental E, values (5-20 kV cm~!)*!"*4. The E, values based 
on 180°-domain-wall motion are quite large and exhibit the correct 
frequency dependence (Fig. 3c), in agreement with experimental results 
obtained in thin films (with thickness larger than the critical size of the 
nucleus)”. This suggests that the 180° switching in ceramics proceeds 
via sequential 90°-domain-wall motion?’, owing to the much smaller 


E, where D is the dimensionality’. By 
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intrinsic nucleation barrier at the 90° domain wall. Thus, the switching 
and coercive fields in PZT are largely determined by the intrinsic prop- 
erties of the appropriate domain-wall-motion mechanism. Similarly to 
the PZT results, we find that switching in BaTiO; ceramics is governed 
by the motion of 90° domain walls (Fig. 3b), with the predicted coercive 
field of around 0.1kV cm! at 300K close to the experimental value for 
coarse-grain BaTiO; ceramics”®. 

Polarization reversal in BiFeO; is another test of our model, owing to 
the importance of octahedral rotations and the presence of three types 
of domain walls in rhombohedrally polarized BiFeO3. DFT calculations 
revealed that the 71° domain wall has the highest energy, followed by 
the 180° domain wall, with the lowest energy for the 109° domain wall. 
The higher energy of the 71° domain wall is attributed to the mismatch 
of oxygen octahedral rotations across the domain boundary. We intro- 
duce a second order parameter (oxygen octahedral rotation, ©) into our 
LGD-based nucleation-and-growth model (see Methods). Using DFT 
domain-wall energies, our analytical model predicts that E, is lowest 
for the 71° domain wall, followed by the 109° and 180° domain walls. 
The predicted coercive fields for 180° domain walls are comparable 
with experimental values in thin films?”~*’. The ability of our simple 
analytical model to estimate E, accurately indicates that the value of the 
coercive field is largely determined by the intrinsic properties of the 
material, with the nucleation barrier on the domain wall controlling 
the dynamics of polarization reversal. 

The dominant role of intrinsic domain-wall motion explains the 
consistent differences in E, of the tetragonal and rhombohedral 
ferroelectrics. For example, an increase in E, of approximately 80% is 
observed across the rhombohedral—tetragonal compositional phase 
transition at the morphotropic phase boundaries in lead-free (Ba, Ca) 
TiO3-Ba(Zr, Ti)O3 and Bi-rich BiScO3-Bi(Zr, Ti)O3-PbTiO3 ceramic 
systems*°. Analysis of our LGD nucleation model incorporating the 
changes in octahedral rotations across the 71° domain wall shows that 
the ratio of the coercive fields for 90° and 71° domain walls is approxi- 
mately two (see Methods). This suggests that the switching in rhombo- 
hedral and tetragonal ferroelectrics proceeds via a multistep switching 
mechanism that involves a series of 71° and 90° steps, respectively, and 
that the higher E, of the tetragonal ferroelectrics is a direct consequence 
of the larger nucleation energy for 90°-domain-wall motion. The uni- 
fied framework presented here relates microscopic zero-kelvin quanti- 
ties to macroscopic material parameters at finite temperature and thus 
suggests an avenue for rational material design. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Molecular dynamics simulations of 90° domain walls. To understand the intrinsic 
dynamics of non-180° domain walls, we study the motion of the 90° domain wall 
in defect-free PbTiO; as an example and then generalize the obtained results to 
other types of non-180° domain walls. We perform constant-temperature 
constant-pressure (NPT; N is the (constant) number of particles) molecular 
dynamics simulations over a wide range of temperatures and electric fields using 
a bond-valence-based classical potential and extract velocity data for the 90° 
domain wall!*!83132_ We use a 40 x 40 x 40 supercell with the polarization direc- 
tion changing from [010] to [100] across the boundary (Extended Data Fig. 1a). 
Owing to the use of an orthorhombic supercell, the domains are homogeneously 
strained, making the relative angle between the orientations of the polarization 
axes of neighbouring domains exactly 90°, rather than 2arctan(a/c) as is geomet- 
rically required for a tetragonal ferroelectric with short-axis lattice constant a and 
long-axis lattice constant c. The electric field is applied along the [100] direction; 
this will cause the domain wall to move along the [110] direction (with velocity 
Vpw) as a result of the 90° switching of [100] dipoles to [010] dipoles at the domain 
boundary (Extended Data Fig. 1b). When dipoles in one layer of unit cells are 
switched by 90°, the wall moves by \/a? + c? /2, and the cells initially with their 
long axis (c) along [100] will now have their short axis (a) along [100], causing 
Lt100] (the cell dimension along [100]) to be reduced by (c — a) (Extended Data 
Fig. 2a). Therefore, the domain-wall velocity vpw can be estimated from the change 
in the cell dimension dLj109] using: 


_ ALp100) Jar+e? ay Jar+e 
2dt 2(c—a)  ~ 2(c—a) 


VDW 


with v= dLt100)/2dt (the factor of 1/2 is due to the presence of two walls in the 
simulated supercell). Owing to the stochastic behaviour of nucleation, 20 simulations 
with slightly different initial structures are carried out for a given temperature and 
electric field to obtain the velocity average and standard deviation. 

It is known that for PbTiO; the values of the lattice constants depend on 

temperature. The lattice constants of PbTiO; are calculated under different tem- 
peratures with molecular dynamics simulations; we find that \/a? + c? /[2(c — a) ] 
depends on the temperature weakly and is in the 5-6 range (Extended Data 
Fig. 2b). This temperature dependence has a different origin from the temperature 
dependence of the domain-wall velocity. For the polarization switching process, 
the relevant kinetic quantity is vpw/a, which is the effective (switching-related) 
domain-wall velocity (vers) in terms of the unit-cell lattice constant a. Therefore, 
to connect the domain-wall velocity at a given temperature with the experimentally 
observed switching rate estimated from the switching current, the obtained velocity 
Vpw must be divided by the lattice constant at that particular temperature. We 
find that veg exhibits the temperature dependence predicted by equation (1). 
Owing to the temperature dependence of the lattice constants, the domain-wall 
velocity measured by vpw deviates somewhat from equation (1). We consider 
Veg as the intrinsic velocity of the domain wall. Nevertheless, when studying 
the effect of the electric field on switching at a given temperature, v, can also 
be used because it differs from the intrinsic, switching-related domain-wall 
velocity Ver by a constant multiplicative factor for each temperature. The tem- 
perature and field dependence of v, are presented in Fig. 1 because v, is the 
quantity that is most easily and directly obtained from our molecular dynamics 
simulations. A vy of 10-50 ms~! corresponds to a domain-wall velocity of 
50-250ms | ora change in supercell dimension of 1-5 A per 10 ps—about 4-20 
unit cells per 10 ps. All simulations are carried out for 10-50 ps and therefore 
allow domain-wall movement that can be detected by examination of the changes 
in the supercell dimensions and total polarization. Our approach for extracting 
the domain-wall velocity from the change in supercell dimension resembles 
the experimental switching-current measurement. Experimentally, the domain- 
wall velocity is extracted by measuring the switching current, which is equivalent 
to dP/dt. We find from molecular dynamics simulations that v, scales linearly 
with dP,/dt for various temperatures (Extended Data Fig. 2c), showing 
that v, is a good indicator of domain-wall velocity for theory-experiment 
comparison. 
The thermal broadening of domain walls is taken into account in finite- 
temperature molecular dynamics simulations. Increased thermal broadening of 
the wall diminishes the polarization at the interface of the two domains, leading 
to a lower nucleation energy, faster domain-wall motion and a lower coercive 
field. As the temperature approaches the critical temperature, the coercive field is 
expected to become low, and the domain-wall motion will take place in the flow 
regime even at low fields. Additionally, the smearing out of the domain wall may 
lead to a transition from layer-by-layer switching to multilayer switching whereby 
several unit cells in adjacent layers switch simultaneously. 


LGD nucleation model. The nucleation energy Unu. that captures the most 
important energy terms can be expressed as Unyc= AUg+ AU; where the polari- 
zation-field coupling term AU; is: 


Aug=-EJ ax f ay f  dziPuc(X, ¥,2)=Ppw yz) 4 


and the interfactial energy AU is: 


AU= fax”, avf AZ{[Ug(Pauc) + Utoc(Pare) | 
—[U,(Ppw) + Uee(Pow)] } 


Here Pruc(X, Y, Z) and Ppw(X, Y, Z) are the polarization profiles of a domain wall 
with and without the nucleus, respectively. Ujoc is the local energy penalty due to 
the deviation of the local polarization from the ground-state bulk value (P): 
Utoc(P) = Atocl1 — (P/P,)7]’, where Ajoc is the energy difference between the ferro- 
electric phase and the paraelectric phase. Ug is the gradient energy due to the 
polarization changes (0;P;) at the domain wall: Ug(P;) = > i &j(OjPi)*» where gj is 
the coefficient for the gradient of the ith component of P along direction j. The 
value of g;; can be derived from the energy and diffusiveness of the domain wall. 
The contributions from elastic strain energy (<*) and strain-polarization coupling 
(eP*) terms could be implemented into equation (3). However, we find that the 
elastic energy change is not significant (see below) and is therefore omitted in the 
following analysis. 

Elastic energy contribution to nucleation energy. We calculate the effective lattice 
constants (defined in Extended Data Fig. 1a) in X-Y coordinates and find that 
they remain almost unchanged across the domain wall (Extended Data Fig. 3a). 
This finding suggests that the elastic energy cost at domain boundaries is not 
significant in an ideal crystal. Extended Data Fig. 3b, c shows the distributions of 
strain gradient in the presence of a diamond-like nucleus (illustrated in Fig. 2b). 
It can be seen that the unit cells of the nucleus have essentially the same lattice 
constants as the rest of the PbTiO; unit cells at the domain wall. Therefore, the 
elastic energy contribution to the nucleation energy (change in elastic energy dur- 
ing nucleation) is negligible and does not have to be treated explicitly. We have 
therefore omitted explicit strain and strain—polarization coupling terms from our 
LGD nucleation model at the lowest approximation. Additionally, although the 
LGD theory presented in the main text does not explicitly refer to elastic inter- 
actions, these are included implicitly. It can be shown that inclusion of strain and 
strain—polarization coupling terms merely renormalizes the fourth-order LGD 
parameter. Because the parameters for the LGD model are obtained from DFT cal- 
culations in which strain polarization coupling is included, these elastic energetics 
are included in the Aj,. parameter that specifies the dependence of local energy 
on local polarization. (Similarly, because the supercell size is allowed to vary in 
the NPT simulations, elastic energy is taken into account in molecular dynamics 
simulations as well.) Therefore, a deviation from the preferred value of polarization 
automatically implies a change in the unit-cell parameters, and the energy of this 
change is included in our model as the local energy penalty (Ujo<). 

Analysis of the Miller-Weinreich nucleation model. The original work of Miller 
and Weinreich!® (illustrated in Extended Data Fig. 4) is based on the following 
assumptions: (1) the nucleus boundary is oriented at a 90° angle relative to the 
original domain wall; (2) the nucleus is located at the surface of the material and 
has a net non-zero boundary charge ( + 2 > 0); (3) the boundary of the nucleus 
has the same interface energy as that of the planar domain wall (ay) on which the 
nucleus is located; and (4) the » parameter that characterizes the strength of the 
depolarization interactions is large relative to the magnitude of the local interface 
energy characterized by o,,. The assumption that a, >> ow leads to the triangular 
(red in Extended Data Fig. 4) nucleus shape. 

Owing to the lack of reliable experimental or first-principles data for the domain- 
wall energy, the model was assumed to be correct in ref. 15 and so was used to 
parameterize the domain-wall energy with the available domain-wall velocity data. 
This allowed the fitting of the electric-field/domain-wall-velocity relationships in 
many experiments. Despite this success, two major studies have cast serious doubt 
on the model. First, first-principles calculations of domain-wall energy per unit area 
(aw) were found to be markedly higher than the fit values and, conversely, inserting 
the accurate, calculated values into the Miller-Weinreich model gave velocities 
that were markedly lower than those observed experimentally’. Second, multiscale 
modelling of the nucleation process on the domain wall for 180° domain walls shows 
that the critical nucleus is not a tall, narrow, sharp triangle, as suggested in ref. 15. 
Instead, the observed nucleus is a diffuse, bevelled square)’, We show that rather 
than the Op > Ow limit assumed in ref. 15, the actual nucleation takes place in 
the ow >> op limit, with the local interface energy playing the dominant role and 
governing the energetics of nucleation and growth. 
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Reduced depolarization energy. For simplicity, we discuss the relative energies 
of the depolarization and local interface terms adopting the triangular shape and 
form of the nucleus energy expression of ref. 15 (presented in Extended Data 
Fig. 4), so that these terms are discussed in the framework traditionally used to 
model nucleation on the domain wall. Four factors contribute to the reduced role 
of depolarization energy in nucleation. 

First, the bevelled shape of the nucleus effectively reduces ow. Because the depo- 
larization energy of the Miller-Weinreich model arises from the electrostatic inter- 
actions between the charges along the boundary of the nucleus, the magnitude of 
0» exhibits a logarithmic dependence on the width of the nucleus (a). Although 
the boundary of the nucleus was assumed to be sharp in ref. 15 and at a 90° angle 
to the domain wall, the actual nucleus boundary has a bevelled shape, as shown in 
previous molecular dynamics studies!”. This decreases the effective domain-wall 
area or, alternatively, the effective local-domain-wall energy (Cefrw) for a given 
nucleus of width a. According to equation (9) in ref. 15 (also presented in Extended 
Data Fig. 4), the magnitude of the width of the critical nucleus a” is determined by 
the ratio between o,, and the PE terms in the limit 0, >> oy and in the limit 
Ow > Op. Thus, for all cases, a decrease in ow leads to a smaller critical width a* and 
therefore a smaller critical depolarization energy o*. The logarithmic dependence 
of op is not weak for the small nuclei observed in our molecular dynamics 
simulations. Therefore, a decrease in the local interface energy due to the bevelled 
shape of the nucleus, which favours smaller critical nucleus size, also substantially 
decreases the magnitude of o*. 

Second, the dielectric constant is enhanced at the domain wall and therefore the 
screening at the domain wall is stronger than in the bulk of the material. Recent 
experimental! and theoretical work* has shown that the dielectric constant at the 
domain wall is larger than that in the bulk. This is confirmed by our molecular 
dynamics simulations that show that the local dipole fluctuations and therefore the 
dielectric constant at the 90° domain wall are enhanced by a factor of two relative to 
the bulk value. Owing to the presence of the dielectric constant in the denominator 
of the formula for op, the actual cp value is then reduced by another factor of two 
relative to the original Miller-Weinreich estimate. 

Third, the diamond shape of the nucleus shows an interaction cancellation 
effect. An additional effect is present for the elongated-diamond-like nuclei found 
in this work. Unlike the Miller-Weinreich model, which is not charge neutral, the 
elongated diamond shape observed in our molecular dynamics simulations exhib- 
its both positive (p; and p2) and negative (—p3 and —p4) boundary charges 
(Extended Data Fig. 5) so that the total charge at the nucleus boundary (Qtot) is 
zero. Therefore, the repulsive energy penalty due to the interaction between p, and 
p2, and between —p3 and — pg, is cancelled by the attractive energy gain of the 
interaction between p; and —p3, and between pz and —py4. This changes the 
dependence of a, on a from In[2a/(eb)] to In[(a/eb)] (the 4P?b/[en(2)] contribu- 
tion (in which € is the dielectric constant) to op (see Extended Data Fig. 4 for 
definitions of e and b) arises from the interaction between the charges on the two 
opposite sides of the triangle; see the text following equation (4) in ref. 15). 
Although this change would have a minor effect on the large nucleus assumed in 
ref. 15, it is highly important for the small nucleus observed in our molecular 
dynamics simulations. 

Finally, the boundary of the nucleus has a much smaller depolarization charge. 
We find that the average boundary charge between the nucleus and the original 
domain as integrated from the polarization changes on the 90° domain wall 
observed in our molecular dynamics calculations (Extended Data Fig. 5) is about 
two times smaller (AP=0.7Cm ”) than that predicted by the sharp polarization 
change (AP =2P,= 1.41 Cm ”) that would be used in a Miller-Weinreich-like 
model. Such a small polarization change is due to the greatly decreased value of 
Pyat the domain wall relative to the bulk value. First-principles calculations'® show 
that the diffuseness of the 90° domain wall means that Py at the domain wall layer 
is only about 50% of the bulk value. This large decrease in Py is also found in our 
calculations (Fig. 2a). It is this domain wall layer that undergoes the nucleation and 
growth process governing the domain-wall motion, and therefore the appropriate 
value of P to be used for estimating the depolarization charge is much smaller than 
the Miller-Weinreich estimate based on the bulk value P,. The much smaller charge 
generated at the boundary of the nucleus decreases the strength of electrostatic 
interactions and a, by a further factor of approximately four. 

Despite the small o,, our nucleus still exhibits an elongated shape; this is due 
to the greater magnitude of the local energy oy for the domain wall at which P 
changes along the P direction than that for the domain wall at which P changes 
along a direction transverse to the P direction, as found in ref. 17 for 180°-domain- 
wall motion. This is also unlike the assumption in ref. 15 that o,, is the same as the 
energy of the flat domain wall for all nucleus boundaries. 

In summary, rather than the Op > Ow limit assumed in ref. 15, the actual nucle- 
ation takes place in the oy >> ap limit, with the local interface energy playing the 
dominant role and governing the energetics of nucleation and growth. This not 
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only justifies our analytical model that neglects the small depolarization energy 
term, but also represents a new understanding of the physics that is important for 
ferroelectric switching. 

Quantitative analysis of 0, reduction. We quantitatively evaluate the impact of 
the effects described above (bevelled shape, high dielectric constant, cancellation 
effect and small depolarization charge) on the depolarization energy term (a5). 
To take the modification of the boundary structure into account, we write down a 
modified version of the Miller-Weinreich formula: 


Unue =— 2PEalc + 2sower)a? +1? + Ug 


8f° Pec? g? a 
Ui= iP In fa = 2opba?/I 
fe 1 eb 


Re of 2" (5) 


° fe eb 


where ow is the energy of the planar 90° domain wall, s= 0.41 is a factor that 
accounts for the reduction in the interface area of the nucleus due to its bevelled 
shape, as described previously!” (cettw= Sow), fc is the scaling factor between the 
actual charge at the nucleus and the boundary charge assumed in the Miller- 
Weinreich model, f. is the scaling factor between the values of the dielectric 
constant (€) at the domain wall and in the bulk, and fg is a factor reflecting the 
effect of the interactions between the charged domain boundaries at the net-neutral 
(Qtot= 0) and net-charged (Qtor# 0) boundaries of the nucleus, with fg = 2 for the 
original, charged, triangular, Miller-Weinreich nucleus and fg= 1 for a net-neutral, 
diamond-like nucleus. 

To determine the dimensions and the energy of the critical nucleus, we evaluate 
Unuc for a wide range of a and | values and identify those that give the lowest energy 
for each nucleus area A =al. Here, we use the DET ow value of 35 mJ m~? for the 
90° domain wall and standard parameters for PbTiO; (dielectric constant «= 60, 
bulk polarization component in the plane of the 90° domain wall P,=0.53Cm~’, 
b=3.9A,c=4A and e=2.718). The plots of the nucleus energy versus area (A) 
for different values of s, f,, f- and fg under an applied field of 0.1 MV cm™!, which 
is typical of the low range of field magnitudes used in molecular dynamics simu- 
lations, are shown in Extended Data Fig. 6. We also show the dependence of the 
nucleus aspect ratio (I'/a*, where I” is the length of critical nucleus; see Extended 
Data Fig. 5) on the ratio of a, and ow, and the a» values obtained for different 
values of s, f., fe and fa. 

Examination of Extended Data Fig. 6 shows several important differences 
between the results of the classical Miller-Weinreich approach and the results 
obtained for a Miller-Weinreich-like nucleus with realistic boundaries. First, 
even for s=f.=f. =1, the obtained a’ = 12.5b and I" = 47b values are relatively 
small. For such a small a’, the dependence of ap on In[(a/(eb)] is not weak 
and, therefore, reduction of a’ due to the effects described above (smaller 
effective domain-wall area due to bevelled shape) has a strong effect on op. Taken 
together, the various effects lead to a reduction in op by a factor of about 30 
relative to the Miller-Weinreich estimate for nucleation at the 90° domain wall 
under an applied field of 0.1 MV cm~!. This results in 0) + 5.7 mJ cm~*, much 
smaller than the local interface energy characterized by the effective domain- 
wall energy oefgw=15.4mJcm 7. The small value of Op justifies our neglect of 
electrostatic interactions in the analytical model of the nucleus, and the much 
smaller op/ow ratio corresponds to an aspect ratio of the critical nucleus (I"/a*) 
that is close to one. 

As illustrated in Extended Data Fig. 7, similar effects can be obtained for nuclea- 

tion on the 180° domain wall under an applied field of 0.3 MV cm! using the DFT 
180°-domain-wall oy value of 132 mJm~? and standard parameters for PbTiO; 
(dielectric constant ¢ = 60, bulk polarization P,=0.75Cm~?, b=3.9 A,c=4A 
and e=2.718). 
Model parameters for non-180° domain walls. The nucleation model discussed 
here is similar to the model in ref. 17. The mapping scheme discussed therein 
allows the treatment of a non-180° domain wall as a generalized 180° domain 
wall lying in the Y-Z plane with polarization changing from +Py to —Py along X. 
The following five parameters are required to estimate the nucleation energy 
at the domain wall under a given temperature T: P.(T), Aloc(T), gyy» gyx and 
Syz» where: 


Py(T) = A(T) 


Aige(0) = 7*Atoc(0 


Da 
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Here P, is the total local polarization, ¥ is the fraction of the polarization variation 
across the domain boundary (for example, y = J2 /2 for a 90° domain wall), Ajoc 
is the energy difference between the ferroelectric phase and the high-symmetry 
paraelectric phase, o}Ay is the energy of a domain wall with normal along X and 
neighbouring dipoles along Y, and 6x is the polarization diffuseness parameter 
over which the polarization changes across the domain boundary. By analogy, of 
is the energy of a domain wall with normal along Y and neighbouring dipoles 
along Y (head-to-head or tail-to-tail domain wall), and dy is the associated diffuse- 
ness parameter. P,(0) and Ajg-(0) are extracted from zero-kelvin DFT calculations. 
The temperature dependence of P,(T) is taken from experiments when available. 
The values of gyy and gyz can be determined on the basis of the domain-wall energy 
(apws calculated from DFT) or diffuseness parameters (6, calculated from mole- 
cular dynamics). In practice, gyy, gyx and gyz are of the same order and therefore 
gyx® gry is a useful approximation. 

For BaTiO3, DFT calculations using the PBEsol density functional** with 

a= 3.986 A and c/a=1.01 give Ajoc(0) =3.48 x 10’Jm +, oygopw= 11 mJm 
To0pw = 3.89 mJ m ?, P,(0) = 0.283 Cm * and gyy=0.61 x 10°!!m?F"!. These 
parameters are used for simulating the hysteresis loop in Fig. 3b. For PbTiOs, 
we use experimental lattice constants (a= 3.9 A and c=4.15 A) for DFT calcu- 
lations with PBEsol and obtain Ajo.(0) = 5.05 x 10°J m7, oigopw = 175 mJ m~?, 
Ooopw = 67 mJ m~, gyx=1.21x 10 ''m3F 1. The temperature dependence of 
polarization is taken from ref. 35, with P,(0) =0.872 Cm~*. These parameters are 
used for predicting the coercive fields of PbTiO3-based ceramics and thin films 
in Fig. 3a, c. 
LGD model for BiFeO3 and other rhombohedral ferroelectrics with Og 
rotations. 71°, 109° and 180° domain walls are all observed in BiFeO3. The ener- 
getics of these three types of domain walls have been investigated with DFT in 
several studies****. In ref. 36, o71pw = 152 mJ m7’, o199pw = 62 mJ m 7 and 
J1goow =73 mJ m ~? was reported using LDA+U. In ref. 38, o71pw=128mJm~?, 
T1o9pw = 33 mJ m 7 and ojgopw = 98 mJ m~? was reported with GGA+U. From 
equation (5), we deduce that op\y o Py\/ Aig. 8yy- Assuming the polarization gra- 
dient coefficient is isotropic, the energy of a non-180° domain wall (ow) can be 
related to that of a 180° domain wall: jw = yoisopw. Therefore, for a given 
ferroelectric, O71DW:9090DW:9 109DW:2 180DW — 0.192:0.354:0.544:1. This relationship 
works well for 90° and 180° domain walls in BaTiO3 and PbTiO; (ref. 16), and 
reasonably well for 109° and 180° domain walls in BiFeO; (refs 36, 38). However, 
the 71° domain wall is found to have the highest energy in BiFeO3, which is 
attributed to the mismatch of oxygen octahedral rotation across the domain 
boundary**”*. To capture this feature, we introduce a second order parameter, 
oxygen octahedra rotation (Q), into the LGD model of BiFeO3. Therefore, the 71° 
domain wall in BiFeO; has the following extra energy term: 


igs kf axf av f dZ[Opw(X, Y,Z) — Ovun(X Y,Z)] 


2 
where K is the harmonic angle constant and OpuncX, Y, Z) ~ 8° (ref. 38). The value 
of K (6.106 x 10°J m~?rad~*) is optimized such that the LGD model reproduces 
the DFT value of o7:pw with the gradient coefficient (gyx =0.32 x 10°! m3F!) 
estimated from ojo9pw. The following term is then added to equation (3) when 
estimating the nucleation energy: 


Ate= 3K J __ax J sie J _ AZ Onue(X, YZ) — Opw(X YZ] 


where an analytical equation similar to equation (4) is used to describe the angle 
profile Onu-(X, Y, Z). Other parameters are Ajo<(0) =5.81 x 108Jm~3, P,(0) = 
0.987 Cm~’ and Ty = 1,120K. 

Coarse-grained simulation of P-E hysteresis loop. The coercive field reflects the 
ease of domain reversal and is one of the most important characteristic param- 
eters of ferroelectrics for practical applications. For the domain-reversal process 
achieved via domain-wall motion, the change in the polarization under an applied 
electric field directly correlates with the distance moved by the domain wall, the 
velocity of which can be estimated using Merz’s law. We extract the pre-exponential 


factor vo in Merz’s law from molecular dynamics simulations in the creep-like 
region and obtain E, for PbTiO; from the LGD model with parameters calculated 
with DFT PBEsol*’. With these values of vp and E,, we then simulate the hysteresis 
loops at 300 K and obtain the frequency dependence of E, for varying domain sizes 
(Fig. 3a). Following the experimental set-up used in most hysteresis-loop meas- 
urements, a triangular electric field E(t), with frequency f, maximum magnitude 
Ey and time t, is used in the simulation: 


1 
4fEot 0<t<— 
{fEo if 
1 3 
E(t) = } —4fEot + 2Eo ag ae 
3 1 
4fEot —4En9 9 —<t<— 


af 


At t=0, the domain of size d is fully poled with saturation polarization —P,. 
Assuming the domain reversal is achieved via domain-wall motion, the polariza- 
tion at time f can be calculated using: 


uf 
Ja. v(t)de ; (6) 
d 


where v(t) is the domain-wall velocity at time ft and is calculated using Merz’s 
law: v(t) = voexp[—E,/E(t)]. When the value of P(t) obtained from equation (6) 
is larger than P, (such that the domain is already fully reversed), P(t) is set to P,. 
Plotting P(t) with respect to E(t) gives the hysteresis loop. The coercive field E. 
is the magnitude of the electric field when P(t) = 0. On the basis of the mole- 
cular dynamics simulation results, we used vp = 300 ms ! for predicting room- 
temperature coercive fields. We find that the coercive field is not sensitive to the 
value of vo, as demonstrated by the moderate change in coercive fields in response 
to orders of magnitude change in d (which is equivalent to changing vo for fixed 
d) shown in Fig. 3. This indicates that the magnitude of the coercive field is largely 
determined by the activation field. 

Comparison of coercive fields for tetragonal and rhombohedral ferroelectrics. 
The values of P,, Ajoc(0), Toopws gyx and gyx are derived from PbTiO. These para- 
meters are used for simulating the hysteresis loop and coercive field of tetragonal 
(T) ferroelectrics. The value of o7;pw is estimated as 0.542c9pw (as explained 
above). To account for the possible octahedral rotations across the 71° domain wall, 
we use the angle constant derived from BiFeO3 when simulating the coercive field 
for rhombohedral (R) ferroelectrics; we find that E: /E®~ 1.8. 

Effect of supercell size. We carried out a benchmark study on the effect of super- 
cell size (Extended Data Fig. 8). We calculated the domain-wall velocity with 
40 x 40 x 40, 50 x 50 x 40, 60 x 60 x 40 and 65 x 65 x 40 supercells at 200 K and 
240K. The key finding is that the values obtained with the 40 x 40 x 40 supercell 
do not substantially deviate from values found using the larger supercells (within 
10ms~!). Most importantly, the v,—E slope is similar for supercells of different 
sizes, showing that the domain-wall dynamics obtained with a 40 x 40 x 40 super- 
cell are robust against supercell size. 
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Extended Data Figure 1 | Large-scale molecular dynamics simulations 
of 90°-domain-wall motions. a, Schematic of a 40 x 40 x 40 supercell with 
90° domain walls used in molecular dynamics simulations. The colours 
of the domains correspond to the polarization (P) wheel shown at the 
bottom. White arrows represent the polarization directions of domains. 
b, Simulated domain evolution under a [100]-oriented electric field (E). 
The dashed yellow lines show the positions of 90° domain walls. The 
electric field is turned on at time fp. The domain-wall velocity vpw along 
[110] (yellow arrows) is estimated on the basis of the change in the 
supercell dimension (L,) along [100] from fp to to + At. The black arrows 
scale with the local dipole of each unit cell. The domain wall motion is 
achieved via the 90° switching of [100] dipoles to [010] dipoles. 
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Extended Data Figure 2 | Lattice constants of supercells used in 
molecular dynamics simulations. a, Pb (orange) and Ti (blue) sublattices 
in a PbTiO; supercell with 90° domain walls. The boundaries are marked 
by green lines. ay and ay are effective lattice constants of the domain-wall 
unit cell defined in the transformed X-Y coordinates and shown by the 
red rectangle. When dipoles in one layer of unit cells switch by 90° (c — a), 


r r 
— 100K . 
— 240K 


the wall moves by (a? + c”)!/7/2 along the [110] direction. b, Temperature (T) 
dependence of \/a* + c? /[2(c — a)] obtained from molecular dynamics 


simulations (squares). It depends on temperature weakly (blue line). c, Plot 
of polarization change (dP,/dt) versus cell-dimension change (v,). The 
solid curves show linear fits at 100 K (blue) and 240 K (red). 
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Extended Data Figure 3 | Elastic energy contribution to nucleation each layer of cells across the domain wall along the [110] direction are 
energy. a, Effective lattice constants across 90° domain walls. The inset is _ plotted. b, c, Distributions of strain gradient at the domain wall in the 
the top view of the 40 x 40 x 40 supercell used in molecular dynamics presence of a nucleus. ay and ay are the effective lattice constants along 
simulations; black arrows indicate the polarization direction. The Y and Z in the absence of nucleus (t= 0 ps in molecular dynamics 
effective lattice constants (ay and ay) are defined in X-Y coordinates, as simulations), respectively. 


explained in Extended Data Fig. 1. The averaged lattice constants for 
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Extended Data Figure 4 | Schematic of a triangular-shaped nucleus, depolarization-contributed domain-wall energy (cp) and the dimensions 
as in the Miller-Weinreich model. The triangular-shaped nucleus for the critical nucleus a’ and I" are taken from the original work of Miller 
(red) has a polarization direction (white arrows) that is antiparallel to and Weinreich, ref. 15; c and b are lattice constants (c= b in PbTiO; and 
its neighbouring domains (blue). The depolarization charges p,2 at two BaTiOs), e is the base of natural logarithm, and ¢ is the dielectric constant. 
boundaries are of the same sign, giving rise to repulsive energy penalty. The op/oy ratio determines the aspect ratio of the critical nucleus (/"/a’). 


The expressions for nucleation energy (Unuc), depolarization energy (Uj), 
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Extended Data Figure 5 | Distributions of polarization gradient at much smaller than the value estimated by the classical theories in ref. 15 
the domain wall in the presence of a nucleus. a, b, The polarization (0.25Cm~?A~!), This difference is due to the diffuse nature of the 
boundary. The total boundary charge (p; + p2+ p3+ pa) is zero. 


gradients (dPy/dY, a; dPz/dZ, b) are highest at the boundary of the 
nucleus. The maximum polarization gradient is around 0.08 Cm ~* AL 
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Extended Data Figure 6 | Results for the Miller-Weinreich model of Qtot = 0 and f. = 1/2 (magenta), and s=0.41, f-=2, Qtor=0 and f-= 1/3 
nucleation on the PbTiO; 90° domain wall using various conditions (cyan). Inset, zoomed-out view showing all the curves. b, Aspect ratio of the 
for the interface boundary. a, Nucleus energy U as a function of Miller- Miller-Weinreich nucleus (/"/a’) as a function of the ratio between a, and oy, 
Weinreich nucleus area (al, given in terms of the number of unit cells (uc)) The Miller-Weinreich assumption that I" >> a’ is not valid for realistic values 
for the original Miller-Weinreich model (black) and Miller-Weinreich of op and oy. ¢, dp for different interface conditions. The actual a, is much 


models with s=0.41, f-=1, Qior #0 and f.= 1 (red), s=0.41, f.=2, Qtor #0 smaller than the estimate used by Miller and Weinreich (MW; ref. 15). 
and f.= 1 (green), s=0.41, f-=2, Qior-=0 and f-= 1 (blue), s=0.41, f.=2, 
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Extended Data Figure 7 | Results for the Miller-Weinreich model of 
nucleation on the PbTiO; 180° domain wall using various conditions 
for the interface boundary. a, Nucleus energy U as a function of Miller- 
Weinreich nucleus area (al, given in terms of the number of unit cells (uc)) 
for the original Miller-Weinreich model (black) and Miller-Weinreich 
models with s=0.41, f.=1, Qior #0 and f.= 1 (red), s=0.41, f.=2, 
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the estimate used by Miller-Weinreich (MW; ref. 15). 
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Self-assembly of microcapsules via colloidal bond 
hybridization and anisotropy 


Chris H. J. Evers!, Jurriaan A. Luiken?, Peter G. Bolhuis* & Willem K. Kegel! 


Particles with directional interactions are promising building 
blocks for new functional materials and may serve as models for 
biological structures'->. Mutually attractive nanoparticles that 
are deformable owing to flexible surface groups, for example, 
may spontaneously order themselves into strings, sheets and 
large vesicles*®, Furthermore, anisotropic colloids with attractive 
patches can self-assemble into open lattices and the colloidal 
equivalents of molecules and micelles”~°. However, model systems 
that combine mutual attraction, anisotropy and deformability 
have not yet been realized. Here we synthesize colloidal particles 
that combine these three characteristics and obtain self- 
assembled microcapsules. We propose that mutual attraction and 
deformability induce directional interactions via colloidal bond 
hybridization. Our particles contain both mutually attractive 
and repulsive surface groups that are flexible. Analogously to 
the simplest chemical bond—in which two isotropic orbitals 
hybridize into the molecular orbital of H,—these flexible groups 
redistribute on binding. Via colloidal bond hybridization, isotropic 
spheres self-assemble into planar monolayers, whereas anisotropic 
snowman-shaped particles self-assemble into hollow monolayer 
microcapsules. A modest change in the building blocks thus results 
in much greater complexity of the self-assembled structures. In 
other words, these relatively simple building blocks self-assemble 
into markedly more complex structures than do similar particles 
that are isotropic or non-deformable. 

Deformability and mutual attraction have recently been combined 
for the self-assembly of nanoparticles by grafting flexible polymers onto 
the surface of mutually attractive particles. This results in isotropic 
clusters!” and self-assembled strings, sheets and large vesicles*?. For 
micrometre-sized colloids, on the other hand, coupling mutual attrac- 
tion and anisotropy leads to patchy particles. Attractive domains, or 
patches, have induced self-assembly into open lattices and the colloidal 
equivalents of molecules and micelles’~°. Here, we combine these three 
properties—mutual attraction, anisotropy and deformability—by 
synthesizing snowman-shaped particles that consist of a deformable 
core and a non-deformable lobe or protrusion. In the first part of this 
Letter, mutual attraction is combined with deformability, resulting in 
anisotropic or directional interactions as the flexible surface groups 
redistribute on binding (Fig. le). This process is analogous to bond 
hybridization in quantum chemistry. When two hydrogen atoms bind 
and form H3, for example, the electrons around each atom redistribute, 
that is, two isotropic orbitals hybridize into the molecular orbitals of 
Hp. Similarly, when mutually attractive, deformable particles bind, the 
flexible surface groups redistribute resulting in directional interactions. 
We refer to this effect as colloidal bond hybridization. We observe fun- 
damentally new behaviour on combining colloidal bond hybridiza- 
tion with anisotropy, that is, for particles that are mutually attractive 
and deformable as well as anisotropic. These snowman-shaped par- 
ticles self-assemble into microcapsules and form spherical cavities at 
high particle concentrations. We hypothesize that mutual attraction, 


anisotropy and deformability are sufficient to stabilize curved struc- 
tures and we support this hypothesis with computer simulations. 

We create isotropic as well as anisotropic building blocks that are 
mutually attractive and deformable. Before discussing the more com- 
plex anisotropic particles, we consider the basic principles of colloidal 
bond hybridization using mutually attractive, isotropic, deformable 
particles. These poly(styrene-co-acrylic acid) spheres are synthesized 
by copolymerization in water (Fig. la, b), and acrylic acid and styrene 
are incorporated at different stages of the polymerization process!. 
Hence the particles consist of a hydrophobic polystyrene-rich core and 
a hydrophilic poly(acrylic acid)-rich brush. The particles are mutually 
attractive as hydrophobic polystyrene groups are present both in the 
interior of the particles and the brush. Furthermore, dynamic light 
scattering shows that the poly(acrylic acid)-rich brush can rearrange 
on the order of 0.1 1m, rendering the particles deformable (Extended 
Data Fig. 1). 

Mutually attractive, isotropic, deformable particles self-assemble 
into planar monolayers in water (Fig. 1c, d). The monolayer sheets are 
hexagonally ordered and move freely in the solution (Supplementary 
Video 1). We hypothesize that a colloidal equivalent of bond hybrid- 
ization drives the formation of the monolayers. The polymer brush 
contains hydrophobic styrene groups as well as hydrophilic acrylic acid 
groups (Fig. le, shown in yellow and blue). The attraction between the 
hydrophobic groups promotes compact structures, whereas excluded 
volume effects of the hydrophilic parts favour unbound particles. To 
accommodate both effects, the polymer brush rearranges on binding: 
hydrophobic parts interact in-plane, whereas hydrophilic parts expand 
out-of-plane. Consequently, directional interactions are induced and 
planar monolayers are formed (Fig. le). 

This segregation process is similar to phase segregation in the 
self-assembly of block copolymers’. In our system, however, copoly- 
mers are anchored to the surface of micrometre-sized particles. 
Consequently, the molecular segregation of the polymers induces 
directional interactions on the colloidal length scale. Our observations 
are also in line with results for polymer-grafted nanoparticles that are 
mutually attractive, isotropic and deformable*®" but here directional 
interactions are induced for particles that are two orders of magnitude 
larger than in previous work. Finally, DNA-coated colloids can also 
form crystalline monolayers’*, but for these particles a functionalized 
surface induces directional interactions. 

Below, we combine colloidal bond hybridization with anisotropy, 
which results in fundamentally new behaviour. Anisotropic building 
blocks are synthesized by growing a rigid protrusion onto the deform- 
able spheres of Fig. 1 (Extended Data Fig. 2). The second lobe is grown 
by swelling with additional styrene'®'*. Furthermore, we increase 
the attraction between the deformable lobes by functionalizing the 
poly(acrylic acid)-rich brush with hydrophobic groups’? (Extended 
Data Fig. 10). Next, the particles are washed by centrifugation. Finally, 
we obtain snowman-shaped particles that consist of a deformable lobe 
and a non-deformable lobe (Fig. 2a). 


1Van 't Hoff Laboratory for Physical and Colloid Chemistry, Debye Institute for Nanomaterials Science, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands. 2Van ’t Hoff Institute for 
Molecular Sciences, University of Amsterdam, PO Box 94157, 1090 GD Amsterdam, The Netherlands. 


00 MONTH 2016 | VOL 000 | NATURE | 1 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


Figure 1 | Self-assembled planar monolayers. a, b, Mutually attractive, 
isotropic, deformable particles (a, b, TEM images) consist of a polystyrene- 
rich core (red), a deformable poly(acrylic-acid)-rich brush (blue) and 
mutually attractive moieties (yellow). c, d, In water, the particles self- 
assemble into planar hexagonal monolayers seen from the top (c) and 

side (d) by reflected light microscopy. e, In colloidal bond hybridization, 
surface groups of deformable particles redistribute on binding; mutually 
attractive moieties move towards the contact area (brown arrows), whereas 
hydrophilic chains move into the solution (blue arrows). Hence, new 
bonds are formed in-plane (black arrow) and not out-of-plane (crossed 
black arrows). Scale bars, 500 nm. 


These mutually attractive, anisotropic, deformable particles self- 
assemble into monolayer microcapsules (Fig. 2). The microcapsules can 
be observed with scanning electron microscopy (SEM) after sintering 
or freeze drying (Fig. 2a, Extended Data Fig. 3). Owing to the rela- 
tively large size of the particles, however, we can study these structures 
in solution using optical microscopy (Fig. 2g—j). The microcapsules 
consist of a particle monolayer whereas the interior is water-filled (Fig. 2c, 
Supplementary Video 3). Furthermore, most particles align tangen- 
tially to the surface of the microcapsules, with the protrusions pointing 
either slightly inwards or slightly outwards (Fig. 2a, b, Extended Data 
Fig. 3). 

For particles with a large lobe of 0.540 1m in diameter, the mean 
diameter of the microcapsules is 3.7 + 0.8 .m, corresponding to 
about 10? particles per microcapsule. Most particles have six near- 
est neighbours, but pentagons occur frequently as expected from the 
Euler characteristic of a sphere (Fig. 2a, Supplementary Video 2). The 
structure of the microcapsules, however, has no overall icosahedral 
symmetry. Furthermore, excess styrene is removed before the micro- 
capsules are formed, so unlike colloidosomes—which are formed on 
emulsion droplets”’—no template is involved. 

By systematically varying the complexity of the particles, we iden- 
tified that in our system mutual attraction, anisotropy and deform- 
ability are required for self-assembly into microcapsules (Extended 
Data Fig. 4a—d). First, before functionalization with mutually attractive 
groups, no microcapsules are found for any of the anisotropic, deform- 
able particles in Fig. 2d—-f, demonstrating the importance of mutual 
attraction. Second, for hydrophobically functionalized, deformable, 
but isotropic spheres, no microcapsules are observed. Finally, for 
functionalized, anisotropic, but non-deformable snowman-shaped 
particles, no microcapsules are observed. These three characteristics 
seem sufficient to induce self-assembly into microcapsules, and are 
relatively easy to implement experimentally. In contrast, particles 
with four orthogonally attractive patches, which have previously been 
predicted to induce self-assembly into microcapsules, have not been 
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Figure 2 | Self-assembled monolayer microcapsules. a, Deformable, 
anisotropic, mutually attractive particles (inset) consist of a core (red) 

with hydrophilic (blue) and hydrophobic (yellow) moieties and a rigid 
protrusion (green), and self-assemble into microcapsules (SEM image after 
sintering). b, c, In solution the particles align tangentially (b; reflected 
light microscopy) and the microcapsules are water-filled (c; confocal 
fluorescence microscopy with dyed water phase). d—-f, The synthesized 
particles have large lobes with diameters of 445 nm (d), 540 nm (e) and 

626 nm (f; TEM images). g-i, For each size (bright field microscopy) 
self-assembled microcapsules are found on functionalization with 
fluoresceinamine. j, On functionalization with tert-butylamine 

(bright field microscopy), self-assembled microcapsules are also found. 
Scale bars, 1 um. 
experimentally realized yet?!. Moreover, the self-assembling tendency 
is robust, as both monolayer microcapsules and planar monolayers 
are found for snowman-shaped particles with large lobes of diame- 
ters ranging from 445 nm to 626 nm, and for hydrophobic function- 
alization with either tert-butylamine or fluoresceinamine (Fig. 2d-j, 
Extended Data Fig. 5). 

At high particle concentrations, mutually attractive, anisotropic, 
deformable particles form curved hollow structures or cavities, and 
these are probably intermediates in the formation of the microcap- 
sules. The tendency to form cavities is robust, as similar structures are 
observed for five different experimental conditions. The first condition 
is at the edge of an evaporating droplet on a glass slide (Fig. 3a—e). 
The evaporation of a droplet of particles in water causes a particle 
flow towards the glass—water-air contact line that is known as the 
coffee-stain effect””. For our particles, hemispherical cavities are spon- 
taneously formed in the resulting dense layer near the contact line 
(Fig. 3b-e, Extended Data Fig. 6a—-d). The second condition is for a 
droplet that is confined between two parallel glass slides (Fig. 3f-j, 
Supplementary Video 4). Again, a dense layer is formed at the contact 
line, but this layer is two-dimensional with circular cavities. The third 
condition is at particle volume fractions of ~0.2, where a highly fluc- 
tuating ‘cavity phase’ is formed (Fig. 3k-n, Supplementary Video 5). 
In this phase, we observe coexisting regions on the order of 1-10 1m 
with either high particle concentrations or virtually no particles, that 
is, dense curved structures around cavities. For the fourth condition, 
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Figure 3 | Cavity formation on densification. a—n, Cavities are observed at 
the edge of an evaporating droplet (a-e, multilayer; f-j, monolayer) and in a 
cavity phase (k-n). Scale bars, 11m. 0, p, The histograms of the interparticle 
distance rj (0) peak at 0. 72 um (cavity phase, blue asterisks) and at 0.62 um 
(microcapsules, red circles), and the effective pair potential w(rj) (p) has a 
minimum of —0.8kgT. a.u., arbitrary units. q—v, Cavities are also observed 
after centrifugation in a thin cell (q-s, bright regions), and upon turning the 


on centrifuging particles in a capillary, cavities are observed in the 
sediment (Fig. 3q-s and Extended Data Fig. 7a). Finally, on dilut- 
ing the sediment, the formation of circular cavities is again observed 
(Fig. 3t-v, Supplementary Video 6). 

For all five conditions, the diameters of the cavities are compara- 
ble to the diameters of the microcapsules. Furthermore, by system- 
atically varying the complexity of the building blocks, we conclude 
that—as for the formation of the microcapsules—mutual attraction, 
anisotropy and deformability all greatly influence the formation of the 
cavities (Extended Data Fig. 4). For isotropic particles, we proposed that 
mutual attraction and deformability induce the observed self-assembly 
into planar monolayers by colloidal bond hybridization. On the basis 
of the above observations, we hypothesize that adding anisotropy to 
mutually attractive and deformable colloids induces a shift from planar 
to curved structures, resulting in microcapsules and cavities. 

The formation of microcapsules is a rare event as only 1 in every 
10* particles ends up in a microcapsule. This can be ascribed to the 
specific orientation of many particles that is required for the formation 
of microcapsules, and the initially weak attractive interactions. The 


cell upside down (t-v). Scale bars, 101m. w, x, Particles form a cavity phase 
on densification (w), and are pushed into close contact on centrifugation (x). 
y, We propose that after redispersion, the particles that surrounded cavities 
are found as microcapsules. Large arrows indicate the flow, ?; small arrows 
indicate the gravitational field, ¢; t denotes time. In a, f, q and t, particles are 
coloured red, the solvent is light blue, glue is dark blue and glass is grey. In 
w-y, cores are red, protrusions are green and deformable brushes are light blue. 


latter becomes apparent as the minimum in the effective pair potential 
is comparable to the thermal energy, kgT (where kg is Boltzmann’s 
constant and T is temperature; Fig. 3p), and particles do not form 
lasting clusters on collision (Supplementary Video 5). Furthermore, 
for particles in such non-lasting clusters, the centre-to-centre distance 
distribution peaks at a distance that is 0.1 jum larger than for particles 
in microcapsules (Fig. 30). The difference in the centre-to-centre dis- 
tance can be attributed to the hydrophilic chains with an estimated 
length of about 0.1 1m. These chains need to move out of the binding 
site on bond formation (Fig. le). 

We propose that both the specific orientation and the formation 
of lasting bonds are induced by centrifugation with cavities as inter- 
mediates. The synthesis consists of several centrifugation steps and 
centrifugation induces the formation of spherical cavities with a sim- 
ilar size and shape as the microcapsules (Fig. 3q-s). Centrifugation 
thus aligns the particles in a specific microcapsule-like orientation. 
Furthermore, on centrifugation, particles are pushed close together. 
This could push the hydrophilic chains out of the binding side, and 
induce the formation of irreversible bonds that arise from van der 
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Figure 4 | Monte Carlo simulations. a, The average number of bonds, 
(Np), decreases with the covered surface fraction, Q. Inset, deformable 
spheres are modelled as attractive spheres (red in all panels) with f mobile 
penetrable hard spheres (blue in all panels). b-d, This results in compact 
clusters (b, black circles in a, e), bilayers (c, blue diamonds in a, e) and 
monolayers (d, red crosses in a, e). e, The average number of bonds for 


Waals forces between the colloids at close proximity. Although we 
have no direct, real-space proof of the formation mechanism, micro- 
capsules could be formed as follows: first, particles form a dense sedi- 
ment with cavities (Fig. 3w); next, centrifugation pushes the particles 
closer together and irreversible bonds are formed (Fig. 3x); finally, 
after shaking, the particles that surrounded cavities are found as 
microcapsules (Fig. 3y). 

We test the key hypothesis that combining colloidal bond hybridi- 
zation with anisotropy can stabilize curved monolayers using Monte 
Carlo simulations. First, we develop a simple model for mutually 
attractive, isotropic, deformable particles. Next, we extend this model 
with anisotropy. 

Mutually attractive, isotropic, deformable particles are modelled 
as central spheres with f satellite spheres each (Fig. 4a). The size 
ratio between the central and the satellite spheres is q and the latter 
are penetrable hard spheres that model the flexible surface groups. 
Penetrable hard spheres can interpenetrate other satellite spheres, but 
have excluded volume interactions with the central spheres. Mutual 
attraction is captured by a square-well interaction between the central 
spheres, and deformability is incorporated as the satellite spheres can 
freely move over the surface of the central sphere. 

To verify if colloidal bond hybridization can induce directional 
interactions—that is, if rearrangement of surface groups can stabilize 
monolayers—we start Monte Carlo simulations in a hexagonal planar 
configuration. Different morphologies are observed when the size and 
the number of satellite spheres are varied (Extended Data Fig. 8b, i-k). 
These two variables can be combined in the covered surface fraction, 


Q= Te On plotting the average number of bonds, (Nj), as a 
function of Q, all of the data collapse onto a single curve (Fig. 4a). 
Furthermore, the transitions between the different morphologies 
occur at well-defined Q values, showing that the Q dictates the mor- 
phology (Fig. 4a, Extended Data Fig. 8b). If Q is small, the particles 
reorganize into compact aggregates with many bonds to maximize the 
attractive interaction between the central spheres (Fig. 4b). When the 
size or number of satellite spheres is increased, however, they redis- 
tribute out-of-plane and mechanically stable bilayers and monolayers 
are observed (Fig. 4c, d). Colloidal bond hybridization can thus induce 
directional interactions and stabilize monolayers for isotropic parti- 
cles, which agrees well with the results in Fig. 1. 

Starting with unbound particles (Extended Data Fig. 8a, c—h), 
we qualitatively reproduce simulation results that have been obtained 
using a more detailed model with tethered chains*. Hence we conclude 
that, although simple, our model captures the main ingredients that 
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snowman-shaped particles. Inset, snowman-shaped particles are modelled 
as deformable spheres with a rigid protrusion (green in e-i). f-i, From 

flat monolayers (f, red crosses in e), the structure changes to curved 
monolayers with in-plane protrusions (g, green bisected circles in e) and 
out-of-plane protrusions (light red squares in e) seen from the side (h) and 
the front (i) at high Q values. 


induce directional interactions: mutual attraction and deformability. 
As our model does not contain any molecular details, we expect this 
behaviour to be generic. This expectation is in line with experimental 
systems of polymer-grafted nanoparticles where both mutual attrac- 
tion and deformability can be identified and directional interactions 
are induced*®, 

We model mutually attractive, anisotropic, deformable particles by 
adding a rigid sphere to the deformable sphere (Fig. 4e, Extended 
Data Fig. 9). The rigid sphere models the polystyrene protrusion in 
the most primitive way, and its hydrophobicity is captured by a short- 
range attraction with other rigid and central spheres. As for isotropic 
particles, the values for the average number of bonds collapse on a 
single curve as a function of Q (Fig. 4e, equation (11)), and planar 
monolayers are stable at moderate Q values (Fig. 4f). On increasing 
the size and the number of satellite spheres further, however, curved 
monolayers with protrusions pointing just out-of-plane (Fig. 4g) are 
observed, whereas for large and many satellite spheres, all protrusions 
point inwards (Fig. 4h, i). 

We conclude that, as hypothesized, colloidal bond hybridization 
can stabilize monolayers and anisotropy induces a shift from planar 
to curved structures. Furthermore, hemispherical monolayers with 
in-plane protrusions (Fig. 4i, Supplementary Video 7) resemble 
segments of the experimentally observed microcapsules. 

Here we combined mutual attraction, anisotropy and deformabil- 
ity in colloidal model particles. Mutual attraction and deformability 
are thought to cause surface groups to rearrange on binding, which 
is a colloidal equivalent of bond hybridization, whereas anisotropy 
induces curvature. These three characteristics are probably sufficient 
to induce self-assembly into microcapsules, a process that—to the best 
of our knowledge—had not been realized before in a colloidal model 
system. We note that although the details are different, the same three 
properties can also be identified in the building blocks of viruses”>-, 
suggesting that these characteristics could also be important in the 
assembly of virus microcapsules. The mechanism we find is funda- 
mentally different from previous work, where directional interactions 
are induced by rigid patches’~*, structural rearrangements on chang- 
ing the solvent” or electric fields’”. Quantification of the attraction 
strength, anisotropy, number of particle lobes and deformability lead 
to a large parameter space that remains to be systematically explored. 
Theoretical work that extends previous studies! is needed to pre- 
dict self-assembled structures from these properties. Independently 
controlling these properties seems impossible for proteins; colloidal 
particles, on the other hand, are promising building blocks to address 
this challenge. 
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METHODS 


Chemicals. Unless stated otherwise, the following chemicals were used as 
received: acrylic acid (AA, 99%), aluminium oxide (A1,03, puriss, >98%), 
tert-butylamine (>99.5%), divinylbenzene (DVB, 55%, mixture of isomers), 
fluorescein sodium salt (F6377), fluoresceinamine (mixture of isomers, >75%), 
N-(3-Dimethylaminopropyl)-N’-ethylcarbodiimide hydrochloride (EDC, purum, 
>98.0%), 2-(N-Morpholino)ethanesulfonic acid (MES, >99%), sodium phos- 
phate dibasic (NagHPOx, BioXtra, >99%), sodium phosphate monobasic dihy- 
drate (NaH2PO4-2H20, BioUltra, >99%) and styrene (ReagentPlus, >99%), were 
obtained from Sigma-Aldrich or its subsidiaries; 2,2’-azobis(2-methylpropioni- 
trile) (AIBN, 98%), potassium chloride (KCl, pro analysi) and potassium persulfate 
(reagent ACS, 99+-%) were obtained from Acros Organics; hydroquinone (puriss, 
>99.5%) was obtained from Riedel-de Haén; glycerol (Ph Eur) was obtained 
from Bufa; ethanol (100%) was obtained from Interchema; potassium hydroxide 
(KOH) was obtained from Emsure; hydrochloric acid (HCl, 37%) was obtained 
from Merck; and Millipore water (MQ) was obtained with a Synergy water puri- 
fication system. 

Synthesis. The synthesis is outlined in Extended Data Fig. 2a and involved the 
emulsifier-free polymerization of cross-linked poly(styrene-co-acrylic acid) 
(CPSAA) spheres, the formation of protrusions by swelling with styrene, heat- 
ing and polymerizing and the covalent linking of hydrophobic moieties to the 
carboxylic groups. 

Cross-linked poly(styrene-co-acrylic acid) spheres of 0.530 + 0.014 |1m in diam- 
eter are prepared by emulsifier-free polymerization of styrene, acrylic acid and 
divinylbenzene following refs 11 and 12. 90 ml MQ and 11 ml styrene were passed 
over an Al,O3 column; 761 jl freshly opened AA and 55,11 DVB were added to a 
250 ml three-neck round-bottom flask. The flask was constantly and vigorously 
stirred with a glass stirrer under a nitrogen flow. Quantitatively, 0.05 g potassium 
persulfate was dissolved as an initiator and added to the flask with 10 ml MQ. After 
15 min, the nitrogen inlet was raised above the liquid level, and after a further 
15 min, the flask was immersed in an oil bath at 70°C to start the polymerization. 
After 20h, a milky-white dispersion was obtained. Excess reactants were removed 
by centrifugation (Beckman Coulter Allegra X-12R). On centrifugation, the par- 
ticles settle at the bottom of the sample, whereas unreacted chemicals were in the 
so-called supernatant. The dispersion was washed by centrifugation at 2.1 x 10°¢ 
three times and the supernatant was replaced by MQ. 

Cross-linked polystyrene (CPS) spheres are prepared in a similar method. For 
these particles 225 ml MQ, 23.5 ml styrene and 0.7 ml DVB were added to a 500- 
ml one-neck round-bottom flask. The flask was constantly and vigorously stirred 
with a polytetrafluoroethylene-coated stir bar and immersed in an 80°C oil bath. 
0.78 g potassium persulfate in 37.5 ml MQ was then added. After 24h a milky- 
white dispersion was obtained, which was washed three times by centrifugation 
and redispersed in MQ. 

To form a protrusion, the CPSAA spheres were swollen with styrene, heated 
and polymerized, in line with refs 16-18. In a typical experiment, the solid mass 
fraction—as determined by drying—was brought to 3%-6% with MQ. About 5 ml 
of the dispersion was magnetically stirred with a polytetrafluoroethylene-coated 
stir bar in a glass tube. Styrene was added with a swelling ratio S= mg,/m; = 3-7 
with mg and m, being the mass of added styrene and the solid mass in the dis- 
persion, respectively. After one to two days of stirring, the tube was immersed 
in an 80°C oil bath for two hours under continuous stirring to form a styrene 
protrusion. Next, 50011 of an aqueous hydroquinone solution (45 mg in 50 ml) 
and 5 mg AIBN in 2501] styrene were added, and the tube was again immersed in 
the oil bath at 80°C for 24h to polymerize the protrusion. Finally, a milky-white 
dispersion was obtained. A millimetre-sized solid white aggregate was often found 
that could easily be removed. 

Hydrophobic moieties are covalently linked to carbodiimide-activated carbox- 
ylic groups on the CPSAA particles by a method adapted from ref. 19. In a typical 
synthesis, a 0.1 M MES buffer (1.95 g in 100 ml MQ), and a 0.2 M phosphate buffer 
(6.72 g NagHPO, and 0.41 g NaH2PO4-2H20 in 250ml MQ) were prepared. 2.5 ml 
of the dispersion was centrifuged and the supernatant was replaced by an EDC/ 
MES solution (45 mg EDC quantitatively added with 40 ml of the MES buffer) 
to activate the AA groups. The dispersion was tumbled at 60 rpm for one hour, 
and washed by centrifugation at 2.1 x 10°g with an MES buffer and twice with 
MQ. The dispersion was again centrifuged and after removal of the supernatant, 
0.028 mmol fluoresceinamine or tert-butylamine was quantitatively added with 
30 ml phosphate buffer to covalently bind fluoresceinamine or tert-butylamine to 
the activated AA groups. The tube was wrapped in aluminium foil, and after tum- 
bling at 60 rpm overnight, the dispersion was washed three times with a phosphate 
buffer, once with an MES solution and five times with MQ. Finally, a milky-white 
dispersion was obtained. The covalent linkage of fluoresceinamine was verified by 
varying the fluoresceinamine coupling method and studying the resulting washed 


particles using fluorescence microscopy (Extended Data Fig. 10). The preparation 
of non-deformable, fluoresceinamine-functionalized snowman-shaped particles 
was previously described in ref. 28. 

TEM. Transmission electron microscopy (TEM) images were taken with a Philips 
TECNAI 10 at 100kV and 16 bit. Samples were prepared by drying a diluted 
dispersion droplet on a polymer coated copper grid under illumination with a 
heat lamp. The image levels were linearly rescaled using ImageMagick so that 
99.9% of all of the values were between the lower and upper level thresholds. 

Figure 1a was obtained at 9.7 nm per pixel and Fig. 1b at 0.95 nm per pixel. 
Figure 2d, e was obtained at 3.5 nm per pixel, Fig. 2f at 6.8 nm per pixel and the 
insets in Fig. 2d-f at 0.95 nm per pixel. Extended Data Fig. 2b, d was obtained at 
4.9nm per pixel, and Extended Data Fig. 2c, e at 0.95 nm per pixel. Extended Data 
Fig. 5a—d was obtained at 0.95 nm per pixel. 

Particle sizes were measured using Gaussian ring transformations in Wolfram 
Mathematica 10 (https://www.wolfram.com/mathematica/). For spherical particles, 
a gradient transform was computed using discrete derivatives of a Gaussian. Circles 
were detected by iteratively convolving with Gaussian rings and finding the maxima. 
SEM. By freeze drying or sintering, self-assembled structures could be preserved 
and studied using SEM. For freeze-drying, 1 ,1l of the dispersion was deposited on a 
polymer-coated copper grid. The grid was vitrified in liquid nitrogen and mounted 
on a cryotransfer unit that was then brought under a vacuum. The temperature 
was increased to —90°C at 5°C min ' and kept constant for about six hours to 
allow the water to sublimate. 

For sintering, the sample was heated above the glass transition temperature of 
polystyrene at about 100°C. First, the dispersion was centrifuged and after redis- 
persion in 1:1 glycerol:water, it was immersed in an 105°C oil bath for 30 min. 
The dispersion was washed three times by centrifugation with MQ and 1 1l was 
placed on a polymer-coated copper grid. After drying, the sample was coated with 
a ~6-nm-thick layer of platinum. 

Both samples were studied with a FEI XL30 FEG operated at 5-10kV, and 

images were obtained at 8 bit. The image levels were linearly rescaled using 
ImageMagick, from the value of the pixel with the lowest intensity to the brightest 
pixel. Figure 2a was obtained at 6.5 nm per pixel. Extended Data Fig. 3c—e, h-j was 
obtained at 3.5 nm per pixel, 1.9 nm per pixel, 3.5 nm per pixel, 11 nm per pixel, 
6.9nm per pixel and 11 nm per pixel, respectively. 
Optical microscopy. Bright-field, fluorescence and reflected light microscopy 
images were captured with a Nikon Ti-E inverted microscope unless stated other- 
wise. The Nikon Ti-E was operated with a Nikon TIRF 100x numerical aperture 
1.49 objective, intermediate magnification of 1.5, and a Hamamatsu ORCA Flash 
camera at 16 bit. For reflected light microscopy, a Nikon Intensilight C-HGF light 
source was used with a Nikon D-FLE filter block. For fluorescence microscopy, the 
same light source was used with a Semrock FITC-3540C filter block. The reflected 
light microscopy images in Fig. 3k—n were obtained with a Nikon Ti-U inverted 
microscope with a Nikon Plan Apo VC 100 numerical aperture 1.40 objective, 
intermediate magnification of 1.5, and a Lumenera Infinity X camera at 8 bit. 
The bright field microscopy images in Fig. 3r—s, u-v, Extended Data Fig. 4i-v 
and Extended Data Fig. 7 were obtained with a Nikon Eclipse LV100POL micro- 
scope with its focal plane parallel to the gravitational field, a Nikon Plan Fluor 
ELWD 40x numerical aperture 0.6 objective and a QImaging MicroPublisher 5.0 
camera at 8 bit. Finally, confocal microscopy images were captured with a Nikon 
TE2000-U, with a Nikon Plan Apo 100 numerical aperture 1.40 objective, a 488- 
nm laser and a 590-nm detector at 12 bit. For images obtained with the Ti-E and 
the TE2000-U, bitmaps were extracted from the microscopy files using bfconvert 
5.1.7 (Open Microscopy Environment). The image levels were linearly rescaled 
using ImageMagick. 

Figure 1c and d was obtained with the Ti-E in reflected light mode at 40nm 
per pixel and image levels were linearly rescaled from the value of the darkest to 
the brightest pixel. Figure 2b was obtained with the Ti-E in reflected light mode 
at 43 nm per pixel, and levels were linearly rescaled from the value of the darkest 
pixel to the brightest pixel. Figure 2c was obtained with the TE2000-U in confocal 
fluorescence mode at 35.72 nm per pixel, and levels were linearly rescaled from 
zero to the value of the brightest pixel. Fluorescein sodium salt was added to the 
water phase, and the image was false coloured green. Figure 2g-j was obtained with 
the Ti-E in bright-field mode at 43 nm per pixel, and levels were linearly rescaled 
from zero to the value of the brightest pixel. Figure 3b-e was obtained with the 
Ti-E in bright field mode at 40 nm per pixel, and levels were linearly rescaled from 
zero to the value of the brightest pixel. Figure 3g-j was obtained with the Ti-E in 
reflected light mode at 43 nm per pixel and levels were linearly rescaled from the 
value of the darkest pixel to the brightest pixel. Figure 3k—n was obtained with the 
Ti-U in reflect light mode at 29 nm per pixel, and levels were linearly rescaled from 
the value of the darkest pixel to the brightest pixel. Figure 3r, s, u, v was obtained 
with the LV100POL at 86 nm per pixel. For Fig. 3s, v levels were linearly rescaled 
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from the value of the darkest pixel to the brightest pixel, and these thresholds were 
also used for Fig. 3r, u. Extended Data Fig. 2f, g was obtained with the Ti-E in 
fluorescence mode at 43 nm per pixel, and levels were linearly rescaled from zero 
to the value of the brightest pixel. Extended Data Fig. 4a was obtained with the 
Ti-E in bright field mode at 43 nm per pixel, and levels were linearly rescaled from 
zero to the value of the brightest pixel. Extended Data Fig. 4e-h was obtained with 
the Ti-E in bright field mode at 40 nm per pixel, and levels were linearly rescaled 
from zero to the value of the brightest pixel. Extended Data Fig. 4i-v was obtained 
with the LV100POL at 86 nm per pixel. For Extended Data Fig. 4m-o, t-v, levels 
were linearly rescaled from the value of the darkest pixel to the brightest pixel, and 
these thresholds were also used for Extended Data Fig. 4i-k, q-s. For Extended 
Data Fig. 41, p, levels were linearly rescaled from zero to the value of the brightest 
pixel. Extended Data Fig. 5e-h was obtained with the Ti-E in bright field mode 
at 43 nm per pixel, and levels were linearly rescaled from zero to the value of the 
brightest pixel. The images in Extended Data Fig. 6 were obtained with the Ti-E in 
bright field mode at 40 nm per pixel, and levels were linearly rescaled from zero to 
the value of the brightest pixel. The images in Extended Data Fig. 7 were obtained 
with the LV100POL at 86 nm per pixel. Image levels were linearly rescaled using 
the thresholds from Fig. 3s and Extended Data Fig. 4m-o. Extended Data Fig. 
10a-d was obtained with the Ti-E in fluorescence mode at 43 nm per pixel, levels 
were linearly rescaled from zero to the value of the brightest pixel, and the images 
were false coloured in green. 

Typically, a sample was prepared by placing 0.5-2 11 of the dispersion between a 
microscope slide (Menzel-Glaser) and a #1.5 cover slip (Menzel-Glaser) with two 
#0 cover glasses (VWR) as spacers. Before use, the slides were cleaned with MQ, 
ethanol and Kimtech precision wipes. The cells were then sealed with glue (Norland 
NOAS81, after ultraviolet curing) or scotch tape. The evaporating droplets between 
two glass slides (Fig. 3f-j) were studied in cells as described above, but without 
sealing the sides of the cell. Evaporating droplets ona glass slide (Fig. 3a—e, Extended 
Data Fig. 4e—h and Extended Data Fig. 6), on the other hand, were studied by plac- 
ing 0.51 of the dispersion onto a cleaned #1.5 cover slip (Menzel-Glaser). Droplets 
evaporated spontaneously and for each dispersion four (two for non-deformable 
particles) time series were obtained of a 891m x 891m region near the contact line 
(Extended Data Fig. 6). High-concentration samples (Fig. 3k—n, Supplementary 
Video 5) were studied after sedimentation in the gravitational field in a similar cell to 
that described above. Thin cells in Fig. 3q-v, Extended Data Fig. 4i-v and Extended 
Data Fig. 7 were prepared in a capillary and sealed with glue, while preventing con- 
tact between uncured glue and the dispersion. A 0.020 x 0.200 x 50mm* capillary 
(VitroCom 5002-050, Kimtech cleaned) was half-filled with the dispersion. Next, 
the capillary was pressed onto a microscopy slide (Menzel-Glaser) with tweezers 
and a foam cushion and nitrogen gas was blown from the filling side to push the 
dispersion to the middle of the tube. While blowing, the other side was sealed with 
a glue droplet (Norrland NOA81) to prevent the dispersion from flowing back. 
Finally, the other side was sealed with glue, the glue was cured with ultraviolet light, 
and the cells were centrifuged in centrifuge tubes (VWR SuperClear). 

Interparticle distance distributions were obtained by analysing reflected light 
microscopy time series with Mathematica. For each frame, the gradient transform 
was computed using discrete derivatives of a Gaussian, and the gradients were 
circle transformed by convolving with a circle. The original image was multiplied 
pixel per pixel with the circle transform, and the local maximums were identified 
as particles. The histogram for particles in the cavity phase (Fig. 30, blue) was 
normalized with a fitted function through the distribution of 10’ distances between 
random points on a plane with the same size as the microscopy images, and the 
resulting histogram was scaled to 1 at large interparticle distances. The effective 
pair potential (Fig. 3p) was calculated using w(rj) /kgT = — In[g(rj)], where g(r) 
is the measured interparticle distance distribution. The histogram for particles in 
a microcapsule (Fig. 3p, red) was scaled so that the maximum has the same height 
as the maximum of the cavity phase histogram. 

The number of particles per microcapsule was estimated from the first peak in 
the interparticle distance distribution of a microcapsule, rj = 0.62 jm, by assum- 
ing a hexagonal orientation and a spherical microcapsule surface, 


Atot 4nR 
Noart = = m 
part F 1 
Apart or? (1) 


with Ajo the surface area of a microcapsule, Apart the surface area per particle and 
Ryn the radius of the microcapsules. For particles with a large lobe with a diameter 
of 0.540 1m, the radii of 30 microcapsules, Ry = 1.9 + 0.4|1m, was determined as 
for transmission electron microscopy (TEM). Finally, the estimated number of 
particles per microcapsule was Npart= 10”. 

Dynamic light scattering. Cross-linked poly(styrene-co-acrylic acid) spheres 
were added to 1mM KOH, and using a Malvern Zetasizer Nano ZS, equipped 
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with an MPT-2 autotitrator, pH was stepwise decreased from 9.8 to 3.4 by adding 
1mM HCL Cumulant analysis gave the apparent size and polydispersity index. 
The influence of electrostatic interactions was studied by replacing the solvents 
with 1mM KOH/9mM KCl and 1 mM HCl/9mM KCI. The influence of the car- 
boxylic acid groups was studied by using cross-linked polystyrene spheres instead 
of CPSAA spheres. 

Simulation model. We model the colloids as hard spheres of diameter o. The 
attractive forces between the particles, due to van der Waals interactions and the 
hydrophobic (polystyrene) component of the brush for example, are captured using 
a square-well potential: 


coo 6R<oa 
U-(R) = E> o<R<ot+A (2) 
0 R>o+A 


where A is the width of the well, <. is the attraction strength and R is the 
centre-to-centre distance between the colloids. We then model the grafted acrylic 
acid polymers as penetrable hard spheres or satellite spheres with diameter relative 
to the hard spheres q (ref. 29) that can move freely across the surface at a fixed 
distance (o + qa)/2, thereby allowing the brush to adapt its configuration to the 
environment. The penetrable hard spheres are free to overlap each other, that is, 
Upus-pus = 0, but have a hard-sphere interaction with the colloids: 


oo r<(o+qoa)/2 


Upuis-<(r) = 0 r>(o+qo)/2 


(3) 


where r is the centre-to-centre distance between the colloid and polymer. 

To obtain snowman-shaped particles, we attach a protrusion to each colloid, 
modelled here as a hard sphere with radius r,, with its centre located at the colloid 
surface. The protrusion hydrophobicity is captured by a weak square-well attrac- 
tion between protrusions with strength Up» = —€p over the range 
2p < R < 2% + A. The colloids and protrusions interact through a square-well 
potential with depth U.p=— JecEp , that is, the geometric mean of the interac- 
tion strengths, over the range 0/2 + —eP<R<o /2+ tp + A. Lastly, the protru- 
sions have a hard-sphere interaction with the polymers: 


oo r<mHt+qa/2 


Upis-p(r) = (4) 


0 r>%m+qo/2 


Given a system containing N particles at positions RN, each grafted with f polymers 
at positions r/, the total energy for the system is then given by 


N 
U(RN, 1%) = S> {U._<(Rij) + Uc-p(LRi+ ui] — Rj) 
izj 


t Ue_p(Ri 


[Rj + uj]) + Upp(LRi+ ui] — [Rj + 4])} (5) 


N ff 
+299 (Upris—c(ti, — Rj) + Urus—p(ri,k— [Rj + uj])} 
ijk 


where r;,¢ denotes the position of the kth polymer of the ith particle and 
u; = M(;)Au is the vector pointing from the centre of the colloid of the ith 
particle to the protrusion. M(§2;) is the rotation matrix for the orientation 2; of 
the ith particle and Au is the vector in the reference frame from the centre of the 
colloid to the protrusion®. 

Simulation details. We employ Monte Carlo simulations in the canonical ensem- 
ble to study the aggregation behaviour of mutually attractive, anisotropic, deform- 
able colloids. We perform two types of simulations: initiating from the soluble 
phase and increasing the interaction strength from ¢,= (3-9)kgT in steps of lkgT 
in system (i) and initiating from a square planar monolayer in a hexagonal pack- 
ing arrangement with fixed interaction strength ¢.=9kgT in system (ii). We set 
N=98 and the box length L is set such that the colloid packing fraction equals 
¢-= 0.001; periodic boundary conditions apply in all directions. The protrusions 
have a radius ry = 0.350, they are weakly attractive relative to the colloids (¢, =¢,/5) 
and are randomly oriented below and above the monolayer in system (ii). Lastly, 
the square-well width is set to A=0.1o. 

We perform 4 x 104 equilibration Monte Carlo cycles and another 150 x 10* 
production cycles for each step in system (i). For each Monte Carlo cycle we 
attempt 50 colloid displacement moves over a fixed maximum distance (0.25L), 
50 colloid moves with variable maximum displacement such that 25% < Pace< 40%, 
50 quaternion rotations of the protrusion and another 50 of the entire nanoparticle, 
50 cluster moves” over a fixed distance (0.15L) and 50f/2 quaternion rotations of 
polymers. We disable the expensive cluster moves for system (ii), allowing us to 
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greatly increase the number of equilibration cycles to 2 x 10° and the number of 
production cycles to 10 x 10°. 

For both systems we perform simulations for every combination of functional- 
ity f € {2, 4, 6, 8, 10, 12} and polymer size q € {0.20, 0.25, ..., 0.70}, and we repeat 
all simulations in the absence of protrusions. We repeat the simulations initiated 
from the hexagonal monolayer an additional four times and all of the resulting 
data plots and morphologies are averages over these runs. 

Simulation data analysis. We evaluate the average number of bonds per particle 
using the expression 


1 N 
(No) = — 93 O(a + A — 14) (6) 


ixj 


where @ is the Heaviside step function. Here, particles are in contact when the 
centre-to-centre distance rj = |r; — rj| between their bodies is less than o + A. 

We define Q as the surface which is covered by surface groups divided by the 
total available surface, 


qe _ (7) 
Atot _ Aex 
where A, is the surface area covered by a satellite sphere, Ator.=47((0 + 0q)/2)? 
is the total area of the sphere over which the satellite spheres move and A,, is the 
surface area excluded by the presence of the protrusion. 

A, and A, are the curved surface areas of spherical caps. These areas are given by 
Acap =2TR caph cap» Where Reap is the radius of the sphere and ap = Reap — ReapCOSO cap 
is the height of the cap. @cap is the angle between the edge of the cap, the centre of the 
sphere and the centre of the cap. Inserting this into the equation for A-ap gives 

Acap=2TRz 


cap 


(1 — cos6cap) (8) 


For the satellite spheres, Reap = (o + oq) /2 and the law of cosines gives 


o+qo\2 o+qo\2 io \2 
( ey" +( 7) (2) = @ (9) 


COSPcap = cea (ar +a 
and for the excluded surface area, Reap = (0 + oq) /2 and 
— ey (a4) (ste) _ | pat p? (10) 
2(3)(S") le 


with p = 21)/o the dimensionless protrusion diameter. 
Inserting equations (8)-(10) into equation (7) gives 


oe Sf 
(1+ 4)(4+ 4q — 2pq— p’) 
For particles without protrusions, p =0, this reduces to 


_ ff 
- 4(1+ 4) 


(11) 


(12) 


28. van Ravensteijn, B. G. P. Kamp, M., van Blaaderen, A. & Kegel, W. K. General 
route toward chemically anisotropic colloids. Chem. Mater. 25, 4348-4353 
(2013). 

29. Asakura, S. & Oosawa, F. On interaction between two bodies immersed in a 
solution of macromolecules. J. Chem. Phys. 22, 1255 (1954). 

30. Bhattacharyay, A. & Troisi, A. Self-assembly of sparsely distributed molecules: 
an efficient cluster algorithm. Chem. Phys. Lett. 458, 210-213 (2008). 
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Extended Data Figure 1 | pH-induced structural rearrangements. 
a, For poly(styrene-co-acrylic acid) spheres with a TEM diameter 
d=0.530 £0.0141m, the apparent hydrodynamic diameter, dyq, is 
measured using dynamic light scattering. At ionic strength I~ 1mM, 
dna equals 0.79 1m at pH 10, but decreases to 0.57 jm at pH 3 (blue). 
On screening electrostatic interactions at I~ 10 mM (red), or for 
polystyrene spheres without acrylic acid (green), however, the measured 
diameter remains almost constant with pH. We conclude that at high pH, 
the electrostatic repulsion between acrylic acid groups triggers the 
poly(acrylic acid)-rich brush to expand by about 0.1 ,.m into the solution. 
b, The measured polydispersity index, PdI, stays constant, indicating that 
changing the pH does not induce aggregation. 
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Extended Data Figure 2 | Synthesis. a~g, Schematic outline (a) and 
microscopy images (b-g) of the synthesis of mutually attractive, 
anisotropic, deformable particles. Poly(styrene-co-acrylic acid) spheres 
(b, c, TEM) with a hydrophobic core (red in a) and a deformable brush 
(blue in a) are swollen with monomer, heated, and polymerized, resulting 
in snowman-like particles (d, e, TEM) with a deformable lobe and a 

rigid protrusion (green in a). Hydrophobic molecules (yellow in a) are 
covalently linked to the acrylic acid groups, resulting in fluorescent 
particles when fluoresceinamine is used (f, g, fluorescence microscopy). 
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Extended Data Figure 3 | SEM images of self-assembled microcapsules. 
To prevent disintegration upon drying, microcapsules are studied after 
sintering (a—e) or freeze-drying (f-j). During sintering, the solvent (light 
blue) is heated in order to partly merge the particles (red) (a, b). During 
freeze-drying, vitrified water (dark blue) is sublimated under vacuum 

(f, g). Particles in the microcapsules have six (blue asterisks) or five (green 
squares) neighbours, and both protrusions that point slightly inwards 
(red circles) and outwards (light red triangles) are found (d, i). Besides 
microcapsules, also planar monolayers (j) are observed. 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Figure 4 | Formation of microcapsules and cavities on 
varying the complexity of the particles. a-d, The complexity of particles 
(a) that are deformable (blue), anisotropic (green) and functionalized 
with mutually attractive groups (yellow) is varied, resulting in non- 
functionalized particles (b), isotropic particles (c) and non-deformable 
particles (d). Microcapsules are only found in the first case. e-v, All 
particles are studied using bright field microscopy at the edge of an 
evaporating droplet (e-h), in a sediment after centrifugation (i-p), and 
upon diluting the sediment (q-v). The entire images of e-h can be found 
in Extended Data Fig. 6 and magnifications of i-k can be found in 
Extended Data Fig. 7b-d. The arrows indicate the directions of the particle 
flow, ?, or the gravitational field, ¢. 
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Extended Data Figure 5 | Monolayer sheets. For mutually attractive, 
anisotropic, deformable particles with varying sizes (a—c), not only 
hollow microcapsules (Fig. 2), but also two-dimensional hexagonal 
planar monolayers (e-g) are observed using bright field microscopy. 
Both fluoresceinamine (a-c, e-g) and tert-butylamine (d, h) are used as 
hydrophobic moieties and for both moieties monolayers are observed. 
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Extended Data Figure 6 | Formation of cavities at the contact line. The 
complexity of mutually attractive, anisotropic, deformable particles (a-d) 
is varied resulting in non-functionalized particles (e-h), isotropic particles 
(i-l) and non-deformable particles (m, n). For each particle type, part of 
the contact line of an evaporating droplet is studied four times (twice for 


non-deformable particles). Many cavities are found for mutually attractive, 
anisotropic, deformable particles (red arrows), whereas for isotropic 
particles many fewer cavities are found and the other particles did not 
show any cavities. Crops of these images can be found in Extended Data 
Fig. 4. The white arrows indicate the direction ?. 
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Extended Data Figure 7 | Centrifuged sediments. Magnifications of the bright field microscopy images in Fig. 3r (a) and Extended Data Fig. 4i-k (b-d). 
a-d, For mutually attractive, anisotropic, deformable particles, spherical cavities are observed in the sediment (a, b), whereas the sediments of similar 
non-functionalized and isotropic particles show no (c) and fewer (d) cavities. 
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Extended Data Figure 8 | Clusters of isotropic particles. 

a, b, Morphology diagrams of mutually attractive, isotropic, deformable 
particles as a function of the dimensionless diameter of the satellite 
spheres, q, and the number of satellite spheres, f. c-k, Representative 
snapshots with cores (red) and satellite spheres (blue). a, When unbound 
particles are used as the initial configuration and q and fare increased, 
compact (c, open circles), cylindrical (d, open triangles), flattened (e, filled 
circles), rod-like (f, asterisks) and finite-size (g, open squares) clusters as 
well as unbound particles (h, filled triangles) are found. b, When the initial 
configuration is a hexagonal monolayer, compact clusters (i, open circles), 
bilayers (j, open diamonds) and monolayers (k, crosses) are observed. The 
transitions between different morphologies are parallel to isolines for the 
covered surface fraction, Q=0.1 to 0.5 (dashed lines in a and b). 
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Extended Data Figure 9 | Clusters of anisotropic particles. 

a, b, Morphology diagrams of clusters of mutually attractive, anisotropic, 
deformable particles as a function of the dimensionless diameter of 

the satellite spheres, q, and the number of satellite spheres, f. c—n, 
Representative snapshots with cores (red), protrusions (green) and 
satellite spheres (blue). a, When unbound particles are used as the initial 
configuration, increasing f and q results in compact (c, open circles), 
cylindrical (d, open triangles), rod-like (e, asterisks), flattened (f, filled 
circles) and finite-size (g, open squares) clusters as well as unbound 
particles (h, filled triangles). b, When the initial configuration is a 
hexagonal monolayer, compact clusters (i, open circles), bilayers (j, open 
diamonds), planar monolayers (k, crosses), curved monolayers with 
in-plane protrusions (1, bisected circles) and curved monolayers with 
out-of-plane protrusions (m-n, filled squares) are found. The transitions 


between different morphologies are parallel to isolines for the covered 
surface fraction, Q=0.1 to 0.7 (dashed lines in a and b). 
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Extended Data Figure 10 | Functionalized CPSAA spheres. 

a-d, Fluorescence microscopy images for variations on the 
fluoresceinamine coupling method. e, f, Normalized fluorescence 
intensity, I/Imax, as a function of the distance, x, on the horizontal line 
through the fluorescence maximum. Poly(styrene-co-acrylic acid) 
spheres were activated and functionalized as described in Methods 

(a, e, f, blue asterisks). Polystyrene spheres were similarly activated and 
functionalized (b, e, f, red circles). CPSAA was similarly treated without 
adding N-(3-dimethylaminopropyl)-N’-ethylcarbodiimide hydrochloride 
(c, e, f, green squares). CPSAA was similarly treated without adding 
fluoresceinamine (d-f, light red triangles). The vertical bars indicate the 
image level thresholds. 
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Concerted nucleophilic aromatic substitution with 


19p— 18p— 
F- and “°F 
Constanze N. Neumann!, Jacob M. Hooker? & Tobias Ritter)?4 


Nucleophilic aromatic substitution (SyAr) is widely used by organic 
chemists to functionalize aromatic molecules, and it is the most 
commonly used method to generate arenes that contain '*F for use 
in positron-emission tomography (PET) imaging!. A wide range of 
nucleophiles exhibit SyAr reactivity, and the operational simplicity 
of the reaction means that the transformation can be conducted 
reliably and on large scales”. During SnAr, attack of a nucleophile 
at a carbon atom bearing a ‘leaving group’ leads to a negatively 
charged intermediate called a Meisenheimer complex. Only arenes 
with electron-withdrawing substituents can sufficiently stabilize 
the resulting build-up of negative charge during Meisenheimer 
complex formation, limiting the scope of SyAr reactions: the most 
common SyAr substrates contain strong 1-acceptors in the ortho 
and/or para position(s)*. Here we present an unusual concerted 
nucleophilic aromatic substitution reaction (CSNnAr) that is not 
limited to electron-poor arenes, because it does not proceed via a 
Meisenheimer intermediate. We show a phenol deoxyfluorination 
reaction for which CSyAr is favoured over a stepwise displacement. 
Mechanistic insights enabled us to develop a functional-group- 
tolerant '8F-deoxyfluorination reaction of phenols, which can 
be used to synthesize '8F-PET probes. Selective !°F introduction, 
without the need for the common, but cumbersome, azeotropic 
drying of '*F, can now be accomplished from phenols as starting 
materials, and provides access to !°F-labelled compounds not 
accessible through conventional chemistry. 

SnAr reactions generally take place via either an addition-elimination 
or elimination-addition mechanism. Both two-step mechanisms display 
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a high-energy intermediate, either an aryne species (elimination- 
addition) or a Meisenheimer complex (addition-elimination)**. 
A concerted displacement of the leaving group by an incoming nucleo- 
phile could avoid the formation of high-energy intermediates and thus 
broaden the scope of suitable electrophiles. Displacements at primary 
aliphatic centres, where charge build-up in a hypothetical Sy1 mecha- 
nism is unfavourable, commonly take place via a concerted mechanism 
involving the o* (C,y,y1-leaving group (LG)) orbital (Sy2 mechanism). 
For aromatic substrates, a direct substitution pathway involving the o* 
orbital of the arene-LG bond (o* (C,,y;-LG)) is deemed to be impossi- 
ble: the o* orbital is shielded because its large lobe points inwards into 
the arene ring (Fig. 1a)°. Concerted SyAr substitutions via the 7-orbital 
framework are considered “possible but restricted to aromatic struc- 
tures devoid of the ring activation to generate an intermediate sigma 
complex of some stability”’. The intramolecular Newman-Kwart rear- 
rangement has been reported to occur through concerted displacement 
for a wide range of arene substrates, albeit mostly with high activa- 
tion barriers (35-43 kcal mol) that reduce synthetic utility®. Here we 
show that the deoxyfluorination reaction of phenols with the reagent 
PhenoFluor (Fig. 2b) reported by our group”* proceeds via a concerted 
pathway with electron-rich as well as electron-poor substrates, and how 
a detailed mechanistic analysis enabled us to design a deoxyfluorina- 
tion reaction of phenols with 'SF, A concerted reaction with activa- 
tion energies between 20 and 25 kcal mol™' is observed because the 
concerted pathway is favoured, rather than because the classic two-step 
mechanism is disfavoured, which sets our reaction apart from previ- 
ous transformations that proceed with substantially higher activation 
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Figure 1 | Comparison of orbital interactions and energy profiles 

in SyAr and CSyAr. a, The aromatic ring blocks the approach of the 
nucleophile to the o*¢_zg orbital; attack on the 7-framework is feasible. 
b, The energy profiles of SyAr and CSyAr differ in the number of 


> 
Reaction coordinate 


transition states and in the magnitude of the activation energies. 
c, Minimization of charge build-up in the transition state renders 
nucleophilic displacement feasible even on electron-rich arenes in 
CSNAr reactions. 
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Figure 2 | Proposed mechanism of PhenoFluor-mediated 
deoxyfluorination. a, After formation of uronium intermediate 1, 
external CsF abstracts hydrogen fluoride (HF) to form tetrahedral 

adduct 2, which undergoes concerted nucleophilic substitution via 
fluoride shift (Ar = 2,6-diisopropylpheny]). b, The intrinsic reaction 
coordinate obtained from DFT calculations (B3LYP/6-31G(d), toluene 
solvent model) shows a single barrier between tetrahedral adduct 2 and 
reaction products (Ar = 2,6-diisopropylphenyl). Structures obtained from 
DFT calculations are shown for 2 and TS. AG*=21.8+0.2kcal mol”! 


barriers!*-*°. Gas-phase nucleophilic aromatic substitutions can take 
place by concerted nucleophile attack and loss of the leaving group, 
but only isolated cases of intermolecular CSyAr reactions in solution 
or ionic melt have been reported'*"'*. 

The orbital interactions involved in a concerted mechanism are 
similar to those of classical addition-elimination pathways, but the 
extent of bond formation and cleavage in the transition state is crucially 
different: in the transition state of CSNAr, both the nucleophile and leav- 
ing group are attached to the arene by partial rather than full bonds. Loss 
of the leaving group in the rate-determining step allows the negative 
charge associated with nucleophilic attack to be located on the incoming 
nucleophile and the departing leaving group, as opposed to the arene 
in conventional SyAr. We propose that selection of leaving groups 
and reaction conditions tailored to a concerted displacement make it 
possible to utilize the minimization of charge build-up on the arene 
to lower the activation barrier (Fig. 1b), which expands the scope of 
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was measured for the transformation of 2 to aryl fluoride and urea 3. 

c, The primary '°O/'80 kinetic isotope effect is consistent with cleavage 
of the C-O bond during the rate-limiting step (Supplementary Fig. 15). 
Silylated phenols react with PhenoFluor to form tetrahedral intermediate 
without CsE. d, Hammett plot for the deoxyfluorination of para-substituted 
phenols at 110°C. e, Regioselective product formation occurs for 


substrates prone to nucleophilic attack at position b if arynes were 
formed**”, 


electrophiles to include deactivated substrates that feature strong x-donors 
in the para-position (Fig. 1c). The PhenoFluor-mediated deoxy- 
fluorination reaction allows the interconversion of 4-hydroxyanisole 
to 4-fluoroanisole at only 110°C (refs 7, 8)—far below the temper- 
ature commonly observed for aromatic substitutions on unactivated 
arenes*®?, 

We propose a reaction sequence for the deoxyfluorination reaction 
in which fluoride attacks the imidazolium core of the reagent to yield 
tetrahedral intermediate 2 before participating in concerted displace- 
ment on the arene (Fig. 2a); independently synthesized and charac- 
terized tetrahedral intermediate 2 is converted to aryl fluoride and 
urea 3 under the reaction condition (Fig. 2b). A single transition state 
(TS) was localized in a density functional theory (DFT) study (B3LYP/ 
6-311++G(d,p)), toluene solvent model) with partial bonds between 
the nucleophile and arene as well as the leaving group and the arene, 
the characteristic feature of a concerted transformation'’. An internal 
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Figure 3 | ‘°F isotopolugue 2. a, CsF abstracts HF from the HF, 
counteranion: without CsF, deoxyfluorination occurs via a different 
mechanism in which HF; attacks the arene. DFT studies reveal that the 
barrier for C-F bond formation is 6.0 kcal mol”! lower with a fluoride 
instead of a bifluoride nucleophile (see Supplementary Fig. 29). 


reaction coordinate analysis revealed that the transition state connects 
tetrahedral intermediate 2 to urea 3 and aryl fluoride, which excludes 
the existence of additional maxima along the reaction path. 

Crucial to the proposal of a concerted substitution mechanism is 
that loss of the leaving group occurs concurrently with attack of the 
incoming nucleophile. The rate observed for the fluorination of !°O-4- 
phenyl-phenol is 1.08 + 0.02 times as fast as the rate of fluorination 
of '8O-4-phenyl-phenol, corresponding to a large primary kinetic 
isotope effect (Fig. 2c)!*. A primary '°O/18O kinetic isotope effect 
shows that cleavage of the C-O bond (and therefore loss of the LG) 
occurs during the rate-determining step'”!*”°. The rate of deoxy- 
fluorination with PhenoFluor is greater for electron-deficient than for 
electron-rich substrates, and the continuity in the Hammett plot reveals 
that no change in mechanism or rate-determining step occurs when 
the electron density on the phenol is varied (Fig. 2d). A single electron 
transfer (SET) mechanism, in which an electron is transferred from the 
phenol arene to the positively charged imidazolium core, is inconsistent 
with the observed Hammett plot: for rate-determining electron transfer, 
reaction rates should be fastest for electron-rich substrates, which is not 
the case. SET occurring under pre-equilibrium conditions followed by 
rate-determining fluoride attack, in which case a positive p value would 
be expected, is unlikely due to the primary '°O/'80 isotope effect. Fast 
and reversible fluoride attack followed by rate-limiting expulsion of 
the leaving group would give rise to a negative p value in the Hammett 
plot. The regiospecificity of the deoxyfluorination reaction discounts 
an aryne mechanism (Fig. 2e). 

Eyring plots were constructed for a selection of substrates, 
which revealed AG* = 20.3 + 0.1 kcal mol~! for 4-nitrophenol, 
AG* =21.0+0.2kcal mol! for 4-cyanophenol, AG? =21.2£0.5kcal 
mol! for 4-trifluoromethylphenol and AG* = 23.4 + 0.2 kcal mol! 
for phenol, respectively. Computational activation barriers 
AG* = 20.8 kcal mol! for 4-nitrophenol and AG? = 25.0 kcal mol! for 
phenol are in good agreement with the experimental values. Compared 
to classical SyAr reactions, the increase in activation energies as the 
aromatic system becomes more electron-rich is far less pronounced 
for concerted SyAr reactions, which is also apparent from the smaller 
Hammett p values; conventional SyAr reactions have p values ranging 


b, Treatment of uronium 1 with '*F does not give aryl fluoride owing to 
the lack of anion exchange between X and '*F-fluoride in solution. c, No 
18f incorporation is observed. d, Anion exchange with extraneous fluoride 
takes place on an anion exchange cartridge (Ar = 2,6-diisopropylphenyl). 


from 3 to 8, compared with 1.8 for the CSnAr reaction reported here 
(Fig. 3c)'°. Limited delocalization of negative charge onto the aromatic 
substrate in the transition state can thus extend the scope of nucleophilic 
aromatic substitution to electron-rich substrates. 

The barrier for CSyAr in the presented deoxyfluorination is low 
relative to hypothetical SyAr reactions on electron-rich arenes. First, 
facile loss of the leaving group is crucial for a concerted nucleophilic 
aromatic substitution reaction!. Unlike in a two-step sequence, 
where a second smaller activation barrier is associated with loss of 
the leaving group, a concerted transformation has a single barrier to 
which both nucleophilic attack, disruption of aromaticity, and loss 
of the leaving group contribute. A neutral leaving group (urea 3) 
will aid in stabilizing the partial negative charge that resides on 
both the nucleophile and the leaving group in the transition state”!. 
An earlier transition state with a lower reaction barrier will occur 
for CSnAr reactions if loss of the leaving group is energetically 
favourable’. Formation of urea 3 is highly exergonic, and because 
partial C-O cleavage occurs in the transition state, the exergonicity of 
the overall transformation is expected to lower the activation barrier 
for deoxyfluorination, an effect also apparent in the ‘°F displacement 
of arenes from triarylsulfonium salts””. Compared with the Newman- 
Kwart rearrangement, which can take place on substrates deactivated 
by electron-donating substituents, the PhenoFluor-mediated deoxy- 
fluorination proceeds with considerably lower reaction barriers, 
probably due to the higher enthalpic gain associated with leaving 
group loss. Second, rearrangement of solvent molecules is commonly 
a large contributor to the activation barriers of nucleophilic aromatic 
substitutions, particularly when anionic nucleophiles are employed”. 
Association of the (bi)fluoride nucleophile with the cationic uronium 
1 solubilizes the nucleophile in the non-polar solvent toluene, and can 
subsequently form neutral tetrahedral adduct 2. We propose that the 
contribution of solvation to the activation barrier is small because 
neither the associated reaction partners nor the transition state carry 
an overall charge and little nuclear motion is required to proceed 
from 2 to TS. Computational data indicate that the use of a non-polar 
solvent favours the occurrence of a concerted deoxyfluorination reac- 
tion (Supplementary Fig. 46). 
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Figure 4 | Deoxyfluorination of phenols and heterophenols with '°F. 

a, Decay-corrected radiochemical conversions were determined by 
comparing the amount of !8F incorporated into the product to the amount 
of '8F not incorporated. b, Electron-rich phenols will result in a smaller 


'8F_fluoride is a desirable nucleophile for the development of CSyAr 
reactions, particularly concerted deoxyfluorination: phenols are easily 
accessible and their high polarity facilitates purification of aryl fluoride 
product from phenol starting material**. However, in addition to the two 
equivalents of fluorine inherent to PhenoFluor itself, additional fluoride 
must be added for efficient deoxyfluorination (Fig. 3a), which, a priori, 
renders PhenoFluor-mediated deoxyfluorination effectively useless for 
'8F chemistry. Even attempts towards a low specific-activity radiodeox- 
yfluorination initially proved fruitless: both isolated reaction intermedi- 
ate 1 (and derivatives featuring different counteranions) and tetrahedral 
intermediate 2 did not react with external '*F-fluoride to yield '8F-aryl 
fluoride products (Fig. 3b, c). Mechanistic work (Supplementary 
Information) revealed that fluoride was not incorporated into tetra- 
hedral intermediate 2 via attack by external fluoride on uronium 1 or 
anion metathesis; instead the fluoride on the aryl fluoride originated 
from PhenoFluor. We thus devised a strategy to alter the mechanism 
of fluoride incorporation into 2 to access '8F-2 in high specific activity: 
while anion exchange of 1 with 18F does not occur in solution, produc- 
tive anion exchange occurs on an anion exchange cartridge (Fig. 3d). 

'8P-fluoride is typically prepared by proton bombardment of '8O-H,0, 
and '®-fluoride is subsequently trapped on an ion-exchange cartridge. 
Elution of the radioisotope is commonly achieved with an aqueous 
solution of a base!”. Here we can use uronium 5 directly for elution of 
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equilibrium constant K, resulting in fluoride expulsion and decomposition 
before productive deoxyfluorination from tetrahedral intermediate '8F-2 
can occur. Ar = 2,6-diisopropylphenyl. 


'8F_ fluoride from the anion exchange cartridge. Uronium 5 can readily 
be prepared from chloroimidazolium chloride 4 and a suitable phenol 
and used after simple filtration. The elution procedure obviates the 
need for azeotropic drying of '*F-fluoride, and subsequent heating of 
the resulting solution of !8F-2 directly provides aryl fluoride. 

No special care is required to exclude air or moisture from the 
'8F_deoxyfluorination reaction, and the radiolabelled product can be 
conveniently separated from the reaction precursor. A wide variety 
of functional groups including amines and phenols as well as 
thioethers and amides are tolerated, and arenes as well as heteroarenes 
undergo radio-deoxyfluorination with high radiochemical conver- 
sion (Fig. 4a). Substrates containing carboxylic acids did not undergo 
'8P-_deoxyfluorination because carboxylic acids inhibit the forma- 
tion of uronium 5. Competing nucleophilic aromatic substitution 
of activated chloride does not occur under the reaction conditions. 
Classical SyAr chemistry is the most widely applied method for the 
synthesis of PET probes but suffers from a very limited reaction scope, 
and protic functional groups are commonly not tolerated. Modern 
methods”>~°, while capable of introducing 18E_fluoride into a more 
diverse range of structures, often suffer from the need for complex 
starting materials, operating or purification procedures. Heterocycles 
are present in many bioactive compounds but are often problematic 
substrates for metal-mediated fluorination protocols with I8p, 
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several heterocycles undergo PhenoFluor deoxyfluorination with high 
radiochemical conversion. To highlight the operational simplicity of 
'8F_deoxyfluorination, '$F-5-fluorobenzofurazan was synthesized from 
34 mCi aqueous !*F-fluoride and subjected to high-performance liquid 
chromatography purification. Within 34min from the end of bombard- 
ment, 9.3 mCi of isolated and purified !8F-5-fluorobenzofurazan could 
be obtained in 27% non-decay-corrected radiochemical yield (RCY) 
with a specific activity of 3.03 Cipmol™!. 

We have established that tetrahedral intermediate 2 is in equilibrium 
with uronium fluoride 6 (Fig. 4b and Supplementary Fig. 36). Clean 
first-order decay of 2 was observed in the presence of added fluoride, 
but a marked deviation from first-order kinetics was observed for the 
deoxyfluorination of silylated phenols in the absence of added fluo- 
ride. Hence, the fluoride anion in 6 probably engages in unproductive 
processes, such as precipitation or other fluoride sequestrations. In °F 
deoxyfluorination, excess CsF negates such potential side reactions, 
but for radiofluorination, fluoride is present in small quantities (nmol). 
For most compounds shown in Fig. 4, potential decomposition of 2 
does not disrupt productive fluorination, but when more electron-rich 
phenols are employed, the equilibrium constant K between 6 and 
tetrahedral intermediate 2 decreases. We have already shown that 
more electron-rich substrates can afford acceptable radiochemical 
conversions, when the conversion is based on soluble fluoride (Fig. 4b). 
While fluoride sequestration from 6 currently precludes the isolation of 
electron-rich '*F aryl fluorides in high radiochemical yields, efficient 
C-!5F bond formation bodes well for mechanism-based strategies to 
increase K, which would render electron-rich arenes accessible. 
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Seafloor geodetic constraints on interplate coupling 
of the Nankai Trough megathrust zone 


Yusuke Yokota!, Tadashi Ishikawa!, Shun-ichi Watanabe!, Toshiharu Tashiro! & Akira Asada? 


Interplate megathrust earthquakes have inflicted catastrophic 
damage on human society. Such an earthquake is predicted to 
occur in the near future along the Nankai Trough off southwestern 
Japan—an economically active and densely populated area in which 
megathrust earthquakes have already occurred!~*. Megathrust 
earthquakes are the result of a plate-subduction mechanism and 
occur at slip-deficit regions (also known as ‘coupling’ regions)®”, 
where friction prevents plates from slipping against each other and 
the accumulated energy is eventually released forcefully. Many studies 
have attempted to capture distributions of slip-deficit rates (SDRs) 
in order to predict earthquakes* !°. However, these studies could not 
obtain a complete view of the earthquake source region, because they 
had no seafloor geodetic data. The Hydrographic and Oceanographic 
Department of the Japan Coast Guard (JHOD) has been developing 
a precise and sustainable seafloor geodetic observation network" 
in this subduction zone to obtain information related to offshore 
SDRs. Here, we present seafloor geodetic observation data and an 
offshore interplate SDR-distribution model. Our data suggest that 
most offshore regions in this subduction zone have positive SDRs. 
Specifically, our observations indicate previously unknown regions 
of high SDR that will be important for tsunami disaster mitigation, 
and regions of low SDR that are consistent with distributions of 
shallow slow earthquakes and subducting seamounts. This is the first 
direct evidence that coupling conditions might be related to these 
seismological and geological phenomena. Our findings provide 
information for inferring megathrust earthquake scenarios and 
interpreting research on the Nankai Trough subduction zone. 

Recurring interplate megathrust earthquakes have occurred along 
the Nankai Trough subduction zone between the Philippine Sea 
plate and the Amur plate, and the next earthquake in this region 
is predicted to occur in the near future!>. This subduction zone is 
frequently discussed in terms of segmented source regions known as 
the Nankaido, Tonankai and Tokai regions, and the past 300 years of 
historical records” describe the occurrence of earthquakes of magni- 
tude 8 in these segments (notably the 1707 Hoei, 1854 Ansei-I and 
Ansei-I], 1944 Tonankai and 1946 Nankaido earthquakes). Moreover, 
earthquakes with magnitudes up to 9 are thought to have occurred 
along each segment”. 

Because megathrust earthquakes are driven by accumulated inter- 
plate slip deficit, these historical earthquakes are thought to have 
occurred on an interplate boundary with a high SDR®’. Therefore, 
to assess the scale of future earthquakes and associated tsunamis, 
it is necessary to understand the whole interplate distribution of SDRs. 
Although many geodetic studies have attempted to obtain this infor- 
mation for the Nankai Trough, they have not been successful. This is 
because the previous geodetic observation network was biased to land 
areas and so could not capture total geodetic information on the sea- 
floor above the interplate boundary*"!°. Although small-scale seafloor 
geodetic observations have been carried out’, they were limited to the 
Kumano-nada region. 


Accordingly, over the past decade we have taken a new approach to 
obtaining total seafloor geodetic information, by means of a broad-scale 
seafloor observation network using a combined global positioning sys- 
tem and acoustic ranging (GPS-A) technique'"'*'*, We have improved 
the precision and frequency of our GPS-A observations since 2000 
and they are amongst the highest global standards (see Methods and 
Extended Data Fig. 1). 

We observe 15 seafloor sites in a wide seafloor region along the 
Nankai Trough (Fig. 1). Six sites were established before the 2011 
Tohoku-oki earthquake. Nine further sites were deployed subsequently 
because data from the original six sites were insufficient to expose a 
complete picture of interplate SDRs">. 

Extended Data Figs 2-4 show time series of the estimated horizontal 
coordinates of seafloor sites for each epoch, relative to their locations in 
the first observations at these sites. The reference frame is International 
Terrestrial Reference Frame (ITRF) 2005 (ref. 16). The positions are 
presented with respect to the stable part of the Amur plate (with stabil- 
ity determined by the MORVEL velocity model!”). The position data 
for each epoch are summarized in Supplementary Table 1. Raw data 
from the six original sites involved coseismic deformation steps owing 
to the occurrence of the Tohoku-oki earthquake. We removed these 
steps from the raw data according to a coseismic source model, estab- 
lished using onshore and seafloor geodetic data’®. In addition, the raw 
data from all sites involved postseismic deformations resulting from 
afterslip and viscoelastic relaxation following the Tohoku-oki earth- 
quake. We removed this nonlinear deformation by using a postseismic 
model, calculated by means of a three-dimensional finite-element 
method!’. 

The corrected data gave us seafloor velocity fields that reflect strain 
accumulation processes that occurred when all sites moved at stable 
displacement rates owing to subduction of the Philippine Sea plate 
under the upper Amur plate. Extended Data Figs 2-4 present linear 
trends fitted to the time series using a robust regression method (the 
M estimation method); Table 1 lists each velocity. The lines fitted to 
the east-west and north-south time series in Extended Data Figs 2-4 
represent the estimated linear site velocities and are shown with their 
95% confidence intervals, which are used for the confidence ellipses 
in Fig. 1. Vertical velocities were not detected, because they were less 
than the detection limit (3-4cm year~ 1, 

Long periods of observation at the six original sites made the con- 
fidence ellipses small. These velocity fields are also compared with 
onshore global navigation satellite system (GNSS) data (Fig. 1), cal- 
culated for the stable period from March 2006 to December 2009, and 
with the rates of convergence of the Philippine Sea plate to the Amur 
plate (calculated using the MORVEL model; see Fig. 1). Our seafloor 
data are roughly consistent with the orientation of plate convergence 
and with onshore velocities. Therefore, all offshore regions on the inter- 
plate boundary have positive SDRs. 

These new data have great potential for advancing the estimation of 
SDR distributions”’. Onshore data cannot resolve offshore interplate 
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Figure 1 | Seafloor velocity field, based on seafloor geodetic 
observations at 15 seafloor sites along the Nankai Trough. Seafloor 
velocity vectors are shown with red arrows; each ellipse indicates the 95% 
confidence level. Onshore velocity vectors were calculated for the period 
from March 2006 to December 2009 using GEONET stations, and are 
shown with light grey arrows. Seafloor sites are named with four characters 
(including a letter and a number). The yellow arrow indicates the 
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boundaries (see Methods). In contrast, our seafloor data can show the 
offshore heterogeneity (although regions adjacent to the trench axis— 
other than those near the TOK1 and ASZ2 sites—cannot be resolved). 
Figure 2a shows the SDR distribution model established using seafloor 
geodetic data (see Methods and Supplementary Tables 2 and 3 for more 
details). 

Along the Nankai Trough, subducting seamounts are located in three 
regions”!~* at which very-low-frequency earthquakes (VLFEs) have 
been activated”*. Below, we discuss mainly the relation of the shallow 
SDR distribution with these seismological and geological features and 
with the latest and predicted megathrust earthquake source regions, 
from west to east. The distribution of deep SDRs in our model is 
robustly similar to that obtained in past studies using only onshore 
data*'°. 

For region A, one edge could not be resolved enough in the model. 
Only at the shallowest site in region A, namely HYG2, could we directly 
catch a glimpse of the undersea SDR. The displacement rate at HYG2 
was lower than the rates at adjacent sites (namely HYG1, ASZ1 and 
ASZ2; confidence levels 95%, 95% and 90%) according to our tests 
of parallelism between each east-west component. These data and 
our model suggest that the region of VLFE occurrence that extends to 
the east of the Kyushu-Palau ridge has a lower SDR than do adjacent 
undersea regions. This spatial relationship suggests that the subducting 
ridge not only activates shallow VLFEs, but also forms the low-SDR 
region (this is a ‘low-coupling’ condition). 
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convergence of the Philippine Sea plate under the Amur plate, calculated 
(using the MORVEL model’’) to be occurring at a rate of 6.5cm year. 

The purple region is the region of maximum earthquake sources, provided 
as the worst-case scenario by the Central Disaster Management Council of 
the Japanese Government. Seafloor topography is based on the J-EGG500 


data set from the Japan Oceanographic Data Center (JODC) of the JHOD. 
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In region B, the deep region of high SDRs corresponds with the 
source region of the Nankaido earthquake of 1946. This region extends 
to the shallow side near the trench axis, which showed no slip in the 


Table 1 | Velocity of each site with respect to the Amur plate 


‘ Standard deviation and 
Site 


ame Position Velocity (cm year~') correlation (cm year‘) 
Latitude Longitude Absolute East(E) North(N) o(E) o(N) Corr(E,N) 
TOK1 34.08 138.14 5.0 -49 0.9 02 O01 00 
TOK2 33.88 137.61 49 -48 1.0 0.2 O01 -0.1 
TOK3 34.18 137.39 bs —5.1 0.8 04 #05 -0.1 
KUM1 33.67 137.00 3.6 —3.6 0.7 01 #02 O12 
KUM2 33.43 136.67 4.3 —4.2 1.0 05 09 —0.5 
KUM3 33.33 136.36 4.0 -3.9 1.0 0.2 02 -0.1 
SIOW 33.16 135.57 4.7 44 1.6 0.2 02 0.0 
MRT1 33.35 134.94 3.4 -3.3 1.0 04 04 O09 
MRT2 32.87 134.81 3.9 -3.8 1.0 0.2 02 -0.1 
TOS1 32.82 133.67 5.5 —4.7 2.8 06 O04 —04 
TOS2 32.43 134.03 48 4.2 24 05 05 -04 
ASZ1 32.37. 133.22 4.5 -4.1 1.9 03 O04 -0.2 
ASZ2 31.93 133.58 4.2 -3.9 1.7 06 O04 —0.5 
HYG1 32.38 132.42 3.8 -3.1 2.1 04 O03 -0.1 
HYG2 31.97 132.49 2.0 -2.0 0.3 06 O07 -—0.6 
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Figure 2 | Interplate SDR distribution as indicated by onshore and 
seafloor geodetic data. a, Contour map showing the SDR distribution 
(for SDRs of more than 3cm year‘) obtained using onshore and seafloor 
geodetic data. Light blue dots indicate shallow VLFEs”*; darker blue 
regions denote subducting seamount and ridges”!~*?, Dotted lines delimit 
regions indicated to be sources of future earthquakes by the Tokai model’; 


latest event. We detected no conspicuous activity of the VLFEs or sub- 
ducting seamount in this near-trench region. In this high-SDR region, 
there are patches of ‘overshot’ SDR (where movement has been roughly 
6.5cm year ', more than the convergence rate), as in past studies®’. 
These patches probably result from interseismic viscoelastic effects’, 
or from underestimation of the convergence rate. 

The broad area of high SDRs is segmented in the eastern region, 
C, which is estimated to have a lower SDR than the neighbouring 
regions, B and D. Additionally, the zones of VLFE activity and sub- 
ducting seamount in region C are located together, as in region A. 
This spatial correspondence is additional evidence that the three 
phenomena (VLFE activity, subducting seamount and SDR) have a 
physical correlation. 

In region D, where the Kii Peninsula protrudes to the south, the 
obtained SDR distribution corresponds with the source regions for the 
1946 Nankaido and 1944 Tonankai earthquakes. The high-SDR region 
F also corresponds with the Tokai region, in which major earthquakes 
have occurred and future earthquakes are predicted°. However, region 
F reaches into the southwest, which had no slip in the 1944 Tonankai 
earthquake and was not indicated as the future Tokai earthquake 
model®. Regions D and F are partitioned by the low-SDR region E. 

Below the shallow seafloor, from regions D to F, the Paleo-Zenisu 
ridge is subducting in the area nearest to the trench axis. However, 
intensive VLFE activity is located in the middle region, E. Therefore, 
the VLFE activity correlates with the low SDR of region E rather 
than with the subduction (although note that the resolving power is 
insufficient in the shallower south region compared with in our deep, 
seafloor sites). 
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solid lines denote the regions of assumed large slip (more than 2 metres) 
resulting from the 1946 Nankaido and 1944 Tonankai earthquakes’. 

b, A schematic illustration of the segmented source regions along the 
shallow side of the Nankai Trough. Darker pink shading shows regions 
with high SDRs. c-e, Separate figures showing SDR distribution, shallow 
VLFEs and subducting seamounts. 


Observation studies”°° in subduction zones worldwide suggest a 


relationship between the low-SDR condition and the subducting areas 
in front of topographic features, including seamounts. VLFE activity 
has also been predicted to be related to the low-SDR condition”’. For 
the Nankai Trough, indirect seismological evidence from seafloor 
studies inferred the physical relationship of the low-SDR condition 
with ridges and VLFEs**”*—a relationship that is probably the result 
of elevated pore-fluid pressure and a complicated fracture network. 
Our discovery of the three low-SDR regions A, C and E is direct evi- 
dence that subducting seamounts generate VLFE activity, which in 
turn causes a low SDR. Our findings also suggest that VLFEs might 
be activated in the low-SDR region in front of subducting seamounts 
worldwide. 

Figure 2b illustrates the segmented source regions discovered 
through our observations. This ‘shallow’ segmentation is inconsistent 
with the well known ‘deep’ segmentation? of the Nankai Trough source 
region. Because the shallow, high-SDR patches control the scale of tsu- 
namis that result from megathrust ruptures, they are important for 
assessment and early warning. For example, the Tohoku-oki earthquake 
had a large amount of very shallow slip”, which led to a devastating 
tsunami. The high-SDR regions B and F are located on the outer shal- 
low zones of the most recent and predicted earthquake sources; there 
have been no historical records of earthquakes in regions B and F since 
1854 (ref. 2). Thus these shallow regions have accumulated slip deficit 
since at least this time and could drive shallow ruptures and tsunamis. 

Low SDRs are suggested for regions A, C and E, which segment and 
separate the high-SDR (megathrust earthquake source) regions, thereby 
possibly regulating earthquake dynamics. For example, the 1944 
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Tonankai earthquake began in region D and was halted in region E, 
in front of region F!**, However, when a rupture breaks through a 
segment boundary, a larger event is possible. The 1946 Nankaido earth- 
quake progressed from region D through region C, finally reaching the 
deep side of region B’?"*. 

Our observations and our model for SDR distribution reflect crustal 
deformation in the past few years only. We plan to perform continuous 
observations over decades to investigate the stability of interplate SDR 
distributions. We can also determine whether decadal-scale changes in 
crustal deformation—like those observed in eastern Japan?°—occur in 
this subduction zone. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Seafloor geodetic observations. Because radio waves scatter in seawater, 
we instead measure seafloor movements using GPS observations above the sea 
and acoustic ranging under the sea. This is the GPS-A method, which is a unique 
approach to monitoring an absolute horizontal movement directly above an off- 
shore interplate boundary. The technique was proposed in the 1980s (ref. 31) and 
established in the 1990s and later!!1>-!4, However, the precision of observation 
in these previous studies was lower than in our method, so they achieved a low- 
precision observation result and needed an uneconomically long observation 
period. Since 2000, the JHOD has been developing highly precise and sustainable 
observation techniques and has provided valuable data for geodesy and seismol- 
ogy—such as data on the preseismic, coseismic and postseismic seafloor defor- 
mations of the 2011 Tohoku-oki earthquake*?4. 

Extended Data Fig. 1 shows our seafloor geodetic observation system 
It consists of a seafloor unit with four acoustic mirror-type transponders, and an 
on-board unit with an undersea on-board acoustic transducer, a GPS antenna/ 
receiver and a dynamic motion sensor. Before 2007, the on-board acoustic trans- 
ducers were mounted at the stern of survey vessels for a drifting survey. After 2008, 
we provided a hull-mounted system to perform a line-controlled sailing survey*” 
for stability and efficiency. 

This system acquires three kinds of data. Kinematic GPS data are gathered to 
determine the absolute position of the survey vessel. Attitude data on the sur- 
vey vessel are also obtained on board by a dynamic motion sensor, to determine 
the coordinates of the on-board transducer relative to those of the GPS antenna. 
Distance data from the on-board transducer to the seafloor acoustic transponders 
are measured by the acoustic ranging technique. The obtained round-trip acoustic 
travel times are transformed to the ranges using profiles of sound speed in seawater. 
These profiles are obtained using temperature and salinity profilers—namely 
a conductivity temperature depth profiler (CTD), an expendable conductivity 
temperature depth profiler (XCTD) and expendable bathythermographs (XBTs)— 
every few hours. 

The consecutive absolute positions of the on-board transducer were determined 
by kinematic GPS analysis using IT software* and attitude data on the survey 
vessel. The position references are the onshore GEONET stations conducted by 
the Geospatial Information Authority of Japan (GSI)**. The resulting position of 
the seafloor transponder was determined using a linearized inversion method 
based on a least-squares formulation, combining the absolute on-board transducer 
positions and the ranges to the seafloor acoustic transponders. This final analysis 
was constrained by the positional relationship of the grouped transponders for all 
epochs**, This analysis cannot provide substantive information on the positioning 
error of each epoch, because we combine the independent observations to estimate 
all the positions. 

To stabilize the estimates, we acquire acoustic ranging data of 3,000-5,000 

shots for one observation at each site, each observation taking about 24 hours. 
The observation uncertainty of this technique is up to just 2-3 cm in the horizontal 
component in each epoch. However, the vertical component has much uncertainty, 
because we observe the seafloor only from the upper region (much as the GNSS 
does). The detection limit of the vertical velocity is 3-4.cm year}. 
Data processing. Raw seafloor geodetic observation data—relating to the Amur 
Plate in the MORVEL data set!”—are shown as red circles before the Tohoku- 
oki earthquake and blue circles after this earthquake in Extended Data Figs 2-4. 
We deducted coseismic and nonlinear postseismic effects resulting from the 
Tohoku-oki earthquake from these raw data. We calculated the coseismic effects 
on the basis of the coseismic source model!®, which was established using many 
onshore stations of the GNSS network (namely GEONET, Tohoku University, and 
others), seafloor geodetic data**?, and data obtained from ocean-bottom pressure 
gauges installed by the Earthquake Research Institute, University of Tokyo. We 
calculated the postseismic effects on the basis of the deformation model”, estab- 
lished using this coseismic slip model and coordinated to match the GSI’s onshore 
data and seafloor data (observed by Tohoku University and us*) following the 
Tohoku-oki earthquake by means of the three-dimensional spherical-Earth 
finite-element model. The prototype of this viscoelastic model was established*® 
by including not only the mantle wedge and the oceanic mantle but also the 
lithosphere-asthenosphere boundary, and is based on the afterslip model devel- 
oped in ref. 29. The model!” used for the calculation of postseismic effects has been 
revised with regard to shallow afterslip. These coseismic and postseismic effects 
are shown in Extended Data Figs 2-4. These effects are large in eastern sites, near 
the source region of the Tohoku-oki earthquake, and very small in western sites. 
The deducted data are shown as red circles in Extended Data Figs 2-4; these final 
data and the raw data are in Supplementary Table 1. 

For estimations of site velocities, we used a robust regression method (M estima- 
tion method) with Turkey’s biweight function. This method mitigates the negative 
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influences of outliers, which are mainly due to disturbances of an undersea sound 
speed structure. 

Interplate SDR inversion method. We constructed the SDR distribution model 
by means of a geodetic inversion using the method of ref. 20. This method includes 
two prior-constraint constants (« and o). We determined the best estimates of 
these ‘hyperparameters’ by minimizing the Akaike Bayesian information criterion 
(ABIC)" and obtained an optimal model. The SDR distribution model was coor- 
dinated to match our seafloor geodetic data and onshore GNSS data. 

We set a fault model with approximately 800 km in the strike direction of 237°, 
and approximately 300 km in the dip direction on the plate boundary. Our fault 
model was deployed simply on the interplate boundary model known as the CAMP 
standard model". We used a B-spline function as a basis function and calculated 
the SDR values by distributing subfaults on the plate boundary. We calculated 
Green's functions using the formulation of ref. 20, considering a homogeneous 
elastic half-space. We deployed this broader model around our seafloor sites. Thus, 
we also used the onshore GNSS data of GEONET around our fault model. These 
onshore data were calculated for the stable period from March 2006 to December 
2009. In order to avoid biases in the inversion and keep the resolution of the SDR 
model smooth, we subsampled the GEONET stations. Weight functions were 
set equally in all onshore and offshore stations. Model boundary conditions are 
detailed below in the section ‘Model boundary condition effect’ 

The northern edge of our model would be affected by block motions, because 
the block boundaries around western Japan are located near the northern boundary 
of our model region. However, a detailed investigation of block motions*’ showed 
that the maximum deformation rate of the block boundary (the median tectonic 
line) does not greatly affect the undersea SDR distribution (the distribution is 
shifted by less than 8 mm year! in the northernmost region of Shikoku Island and 
by 3mm year in the eastern region). Intraplate deformation has a negligible effect 
in this SDR model calculation. Although splay faults’**** also have implications 
for the generation of megathrust earthquakes and tsunamis, these smaller-scale 
fault geometries cannot be monitored and discussed by our present seafloor 
geodetic observation network. 

Our SDR distribution model is shown in Fig. 2 and Extended Data Fig. 5b. 

Hyperparameter values for the prior constants (a and o) were 1.6 x 107! and 
1.3x 1071. Our data improve the previous model, which used only onshore data 
(Extended Data Fig. 5a). The calculated SDR values for subfaults, and a comparison 
between observed and calculated data, are described in Supplementary Tables 2 
and 3, respectively. 
Resolution of the SDR inversion. We carried out checkerboard resolution 
tests in order to examine the SDR model. We generated synthetic data for the 
checkerboard-like SDR distributions (Extended Data Fig. 6a) with 20 errors of 
0.3cm year! and 1.5cm year’! for onshore and seafloor data, respectively. The 
synthetic data were inverted using the same parameters and settings as for the 
SDR inversion. Extended Data Fig. 6 shows the resulting distributions, using only 
onshore data (Extended Data Fig. 6b) or both onshore and seafloor data (Extended 
Data Fig. 6c). An unsolved offshore region in Extended Data Fig. 6b was solved 
clearly in Extended Data Fig. 6c. 

Extended Data Fig. 6 also shows resolution values as diagonal elements of the 
resolution matrix, calculated for onshore data only (Extended Data Fig. 6d) or for 
both onshore and seafloor data (Extended Data Fig. 6e). The resolution matrix is 
represented as R = (H"H + a?G'G)-!H"H where His the static-response-function 
matrix; a is the hyperparameter of smoothness” G is the spatial smoothness 
matrix”’; and the superscript T denotes the transposed matrix. 

Undersea areas with low values, shown in Extended Data Fig. 6d, were improved 
by the seafloor data (Extended Data Fig. 6e), although the region adjacent to the 
trench axis cannot be resolved even with our seafloor network because there is no 
site (other than those near the TOK] and ASZ2 sites). 

We also set Green's functions to zero for outer subfaults of the interplate bound- 
ary (on the south of the trench). These subfaults affect neighbour low-resolution 
subfaults through spatial smoothing. Thus, the resolving power for the shallow- 
est subfaults to the south of our seafloor sites was not sufficient (Extended Data 
Fig. 6c). 

Model boundary condition effect. We examined the boundary condition of 
this inversion model. Each test was calculated using the best hyperparameters, 
determined by minimizing the ABIC. 

In the resulting SDR model, we used a ‘zero backslip (full creeping)’ condition 
at the trench side boundary, and a free condition at the other boundaries. We also 
used the free condition at all the boundaries; zero backslip at all the boundaries; or 
a6.5cm year ! constraint at the south edge (the other boundaries being in the free 
condition), in order to examine the model boundary condition effect (Extended 
Data Figs 7a-c). The results suggest that the boundary condition did not control 
the main part of the undersea SDR calculation, except in low-resolution shallow 
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areas. There were small differences in the r.m.s. of misfits between observations 
and calculations in these cases. 

VLFE distribution. The VLFE distribution in Fig. 2a, d was determined by auto- 
matic analysis”! using the method of ref. 46. This approach separates VLFEs and 
ordinary earthquakes automatically by comparing them with the Hi-net cata- 
logue produced by the National Research Institute for Earth Science and Disaster 
Resilience. (Aftershocks following the magnitude-7 event could not be fully 
differentiated.) We plotted the unordinary events (mainly VLFEs) from the period 
1 August 2008 to 10 May 2015 without including the aftershocks resulting from 
the Kii Peninsula earthquake of 2004. We also plotted all the events detected in the 
period 1 June 2003 to 10 May 2015 (Extended Data Fig. 8). 

Subducting seamounts. The previous reflection and refraction surveys*!-*? were 
carried out broadly along the Nankai Trough, guided by past geomagnetic studies 
and by known seismic and bathymetric information. These surveys detected three 
subducting seamounts (Fig. 2a, e) in front of the visible bathymetric features (the 
Kyushu-Palau ridge, the Kinan seamount chain and the Zenisu ridge) shown in 
Fig. 1. The VLFEs were activated around these regions. 
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Extended Data Figure 1 | Our GPS-A seafloor geodetic observation This diagram was modified from refs 15, 34, 37. CTD, conductivity 
system. The system, comprising on-board and seafloor sensors, was temperature depth profiler; XCTD, expendable conductivity temperature 
developed as described in ref. 11 and improved as described in ref. 35. depth profiler; XBT, expendable bathythermographs. 
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Extended Data Figure 2 | Horizontal movements of seafloor sites 
over time. Time series for east-west (left column) and north-south 
(right column) displacements of seafloor sites TOK1, TOK2, TOK3, 
KUM1, KUM2 and KUM3 from 2006 onwards. The position reference 
is the Amur plate. Blue circles indicate raw observations before 
deductions of the coseismic and postseismic effects that resulted from 
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the magnitude-9.0 Tohoku-oki earthquake of 11 March 2011. Red circles 
indicate the corrected final results. The linear trends and the 95% 
two-sided confidence intervals are shown with red solid and dashed lines, 
respectively. Grey lines show the calculated coseismic and postseismic 
deformations following the Tohoku-oki earthquake'®!”. 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


2011/3/11 2011/3/14 
(M9.0) (M9.0) 
0.10 0.30 , . i , , - 
(7) SIOW : 
= 0,00 = € 020+ 4 
E E : e 
A = 
®B -0.10 2 010- ; 4 
oO (e} 
Ww Zz 
A 0.20 A 0.00 F-@ ' 4 
v 7 ' a= 
% -0.30 1 = 010+ 4 
B i € 0.10 
Ss : 3 
-0.40 1 ° A -020+ ; 4 
t ' 
-0,50 l 1 1 1 pe 1 1 1 1 -0.30 Hb 
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 
(8) MRT1 ‘i - 
=> 0.20 © 020 
E E 
B 0.10 = o.10 
0 3] 
Ww Z2 
A 0.00 A 0.00 
v Vv 
® -0.10 Ss 0.10 
g 
-0.20  -0.20 
-0.30 L l L 1 4 1 1 1 -0.30 1 mn nt 1 1 n 
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 
(9) MRT2 20 Se 0.30 
=> 010+ 4 € 020 
S E 
B® 000F 4 = 0.10 
O 5 
wi z 
A -O10F 4 A 0.00 
v v : 
B -0.20 + + € 0.10 F ‘ 4 
' 
fo} 
5 -0.30 + 4 Q) -0.20- : 4 
' 
-0.49 -—L-_1_ dr 20.30 b-_ 
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 
(10)TOS1 ° _ 
=> 0.10 © 0.20 
E = 
® 0.00 = 0.10 
Oo {eo} 
Ww Zz 
A 0.10 A 0.00 
v v 
H -0.20 S -0.10 
= 3 
-0.30 “ -0.20 
-0.40 -0.30 
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2008 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 
(11)TOS2 * = 
=> 0.20 © 0.20 
E E 
B 0.10 = 0.10 
© 5 
WwW Zz 
A 0.00 A 0.00 
v v 
DB -0.10 S -0.10 
) a 
= fe) 
-0.20  -0.20 
-0.30 1 1 41 L im 41 1 1 1 -0.30 
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 
(12)ASZ1— , lee eae 
' sats ' 
> 010 i E 0.20 i + 
—E Pes = ' 
%B 0.00 + Ss € 0.10 : 4 
a 2 - 
A -O.10F 1 A 0.00 et 4 
Vv Vv 
' ' 
mB 020+ 1 S -0.10 ' 4 
g ' 3 ' 
-0.30 F 7 Q -0.20 : 4 
1 ' 
-0.40 4 1 1 4 ma 1 1 1 L -0. 


2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 


30 
2006 2007 2008 2009 2010 2011 


2012 2013 2014 2015 2016 


Extended Data Figure 3 | Horizontal movements of seafloor sites over time. Time series showing east-west (left column) and north-south 
(right column) displacements of seafloor sites SIOW, MRT1, MRT2, TOS1, TOS2 and ASZ1. Colours are as in Extended Data Fig. 2. 
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Extended Data Figure 4 | Horizontal movements of seafloor sites over time. Time series showing east-west (left column) and north-south 
(right column) displacements of seafloor sites ASZ2, HYG1 and HYG2. Colours are as in Extended Data Fig. 2. 
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Extended Data Figure 5 | Comparison of calculated SDR distributions 
obtained using onshore data only versus onshore plus seafloor data. 

a, b, SDR distributions (for SDRs of more than 2cm year~ 1) calculated 
using only onshore data (a) or using onshore and seafloor data (b). 
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Black and white vectors indicate the observed data and the calculated 
velocities, respectively. Grey shading indicates areas with resolution 
values lower than 0.05 (calculated in Extended Data Fig. 6d, e). « is the 
hyperparameter of smoothness. 
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Extended Data Figure 6 | Checkerboard resolution tests and d, e, Resolution values as diagonal elements of the resolution matrix, 
distributions of resolution values. a—c, Results of checkerboard calculated using onshore data only (d) or using onshore and seafloor 
resolution tests for the SDR inversions. a, Input checkerboard-like data (e). Black and blue dots denote the onshore and seafloor sites, 
SDR distribution. b, c, The checkerboard distributions calculated from respectively, used in each calculation. a is the hyperparameter of 
using onshore data only (b) or using both onshore and seafloor data (c). smoothness. 
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Neural correlates of single- vessel haemodynamic 


responses in vivo 


Philip O’Herron!, Pratik Y. Chhatbar!, Manuel Levy!, Zhiming Shen!, Adrien E. Schramm!, Zhongyang Lu! & Prakash Kara! 


Neural activation increases blood flow locally. This vascular signal 
is used by functional imaging techniques to infer the location and 
strength of neural activity’”. However, the precise spatial scale 
over which neural and vascular signals are correlated is unknown. 
Furthermore, the relative role of synaptic and spiking activity in 
driving haemodynamic signals is controversial*~*. Previous studies 
recorded local field potentials as a measure of synaptic activity 
together with spiking activity and low-resolution haemodynamic 
imaging. Here we used two-photon microscopy to measure 
sensory-evoked responses of individual blood vessels (dilation, 
blood velocity) while imaging synaptic and spiking activity in 
the surrounding tissue using fluorescent glutamate and calcium 
sensors. In cat primary visual cortex, where neurons are clustered 
by their preference for stimulus orientation, we discovered new 
maps for excitatory synaptic activity, which were organized 
similarly to those for spiking activity but were less selective for 
stimulus orientation and direction. We generated tuning curves 
for individual vessel responses for the first time and found that 
parenchymal vessels in cortical layer 2/3 were orientation selective. 
Neighbouring penetrating arterioles had different orientation 
preferences. Pial surface arteries in cats, as well as surface arteries 
and penetrating arterioles in rat visual cortex (where orientation 
maps do not exist'®), responded to visual stimuli but had no 
orientation selectivity. We integrated synaptic or spiking responses 
around individual parenchymal vessels in cats and established 
that the vascular and neural responses had the same orientation 
preference. However, synaptic and spiking responses were more 
selective than vascular responses—vessels frequently responded 
robustly to stimuli that evoked little to no neural activity in the 
surrounding tissue. Thus, local neural and haemodynamic signals 
were partly decoupled. Together, these results indicate that intrinsic 
cortical properties, such as propagation of vascular dilation 
between neighbouring columns, need to be accounted for when 
decoding haemodynamic signals. 

To determine how neural activity leads to changes in cerebral blood 
flow, the haemodynamic responses of individual vessels need to be 
compared to neural activity in the surrounding tissue’. While sensory- 
evoked responses of individual vessels have been measured in the 
somatosensory cortex and olfactory bulb of rodents, these stud- 
ies have not measured vessel responses to the full range of stim- 
uli for which the neighbouring neural tissue is responsive. Thus, 
the degree to which vascular signals match local neural activity 
has been difficult to assess. Here we compare neural and vascu- 
lar responses to a full range of stimulus orientations in cat primary 
visual cortex to determine if vascular responses can be predicted 
from local neural activity. Additionally, the primary visual cortex 
of the cat, similar to that of primates including humans, is organ- 
ized into precise maps such that different columns of neural tissue 
are optimally activated by different stimulus orientations (Fig. 1a). 
Therefore the orientation selectivity of vessel responses can be linked 
to the spatial scale of neurovascular coupling. For example, if blood 


flow in a single cortical vessel is sensitive to neural activity over a 
large spatial scale covering many orientation columns, then the vessel 
should dilate to a broad range of stimulus orientations. By contrast, 
if the vascular response is controlled very locally, that is, within the 
scale of an orientation column, then individual vessels may be highly 
orientation selective. 

We first labelled blood vessels in the cat primary visual cortex 
with the fluorescent indicators Texas Red Dextran or Alexa 633 (see 
Methods)'*, and measured the dilation responses to drifting grating 
stimuli of different orientations. Veins and capillaries, which were dis- 
tinguished from arteries by a number of means’? (see Methods), were 
not included in this initial analysis because they rarely exhibit rapid 
sensory-evoked dilation'?"'4. Our data set included all other blood ves- 
sels, provided that they were sufficiently labelled and imaged in tissue 
with minimal movements from respiration. All blood vessels in this 
data set dilated in response to drifting grating visual stimuli (P < 0.05 
analysis of variance (ANOVA)). Specifically, we found that parenchy- 
mal arterioles in layer 2/3 typically dilated more strongly in response 
to one or two of the stimulus orientations presented (Fig. 1b), whereas 
pial surface arteries dilated to all orientations nearly equally (Fig. 1c). 
For each vessel, we computed the orientation selectivity index (OSI; 
see Methods), such that when a vessel dilates equally to all stimulus 
orientations the OSI=0 and when a vessel responds only to a single 
orientation the OSI = 1. The OSI was much greater for parenchymal 
arterioles than for pial surface arteries (OSI parenchymal arteriole 
mean + standard error of the mean (s.e.m.) = 0.21 £0.01; n =79 ves- 
sels and OSI surface artery mean + s.e.m. =0.06 + 0.01; n = 24 vessels; 
P<10-; Mann-Whitney test; Fig. 1d). 

To illustrate further the role of an organized map of neocortical 
neurons in generating tuned parenchymal vessel responses, we also 
measured dilation changes in rat primary visual cortex. Because 
cortical neurons in rats are not organized in an orientation map?°, 
each parenchymal vessel is surrounded by neurons displaying a 
variety of orientation preferences (Fig. le). In rats, we found no 
orientation selectivity in cortical layer 2/3 parenchymal arterioles 
(Fig. 1f; OSI mean +s.e.m. =0.06 + 0.01; m = 16 vessels) or pial sur- 
face arteries (Fig. 1g; OSI mean +s.e.m.=0.05 +0.01; 1 =21 vessels) 
(Fig. 1h). 

To compare the orientation selectivity of cat parenchymal vessels to 
spiking activity in the surrounding tissue, we performed calcium imag- 
ing using Oregon Green BAPTA-1 AM (OGB-1 AM) or GCaMP6s, 
along with vascular imaging from the same sites (see Methods). Figure 2a 
shows a penetrating arteriole that dilates most strongly to the same 
stimulus orientation as preferred by the immediately adjacent cortical 
neurons. However, the vessel also dilated when other stimulus orien- 
tations were presented, despite minimal or non-existent responses in 
the nearby neurons. As a result, the vessel had a substantially broader 
OSI (0.20) than the neuronal spiking activity (OSI = 0.82, average 
across six adjacent neurons labelled in Fig. 2a). As a penetrating arte- 
riole is likely to be sensitive to neural activity from more than just the 
immediately adjacent cells, we examined whether spiking activity over 
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Figure 1 | Selectivity of blood vessel dilation to sensory stimuli in 
species with and without cortical orientation maps. a, Schematic 

of cat visual cortex showing the columnar organization of neurons by 
orientation preference and a pial surface artery with multiple branches 
penetrating the parenchyma. Different colours of neuronal cell bodies 
represent their different preferred stimulus orientations. b, Time courses 
and polar plots (averages of six (top) and eight (bottom) trials) of the 
changes in dilation of two layer 2/3 arterioles in cat visual cortex to visual 
stimulation. Error bands represent s.e.m. and grey bars represent the 
periods of visual stimulation. In this and subsequent figures, stimuli were 
gratings that drifted in eight different directions of motion and polar plots 
are normalized to the maximum response. c, Time course and polar plot 
of responses from a surface artery in cat (average of four trials). 

d, Population distribution and median OSI for parenchymal (n =79 vessels 
in 18 cats) and surface (n = 24 vessels in 9 cats) vessels. e, Schematic of 

rat visual cortex where neurons with different orientation preferences 

are intermingled. f, g, Time courses and polar plots of responses from two 
parenchymal arterioles (averages of seven (top) and eight (bottom) trials) 
and a surface artery (average of five trials) in rat visual cortex. 

h, Population distribution and median OSI for parenchymal (n = 16 vessels 
in 6 rats) and surface (n = 21 vessels in 7 rats) vessels. 


a larger region might explain the broad orientation selectivity of the 
parenchymal vessels. Previous work has shown that occlusion of a 
single penetrating arteriole in the neocortex leads to tissue death in 
regions with approximately 400 1m diameter'®, suggesting that this is 
the region of tissue that an individual penetrating arteriole supplies (see 
also Supplementary Information). Therefore we compared vascular 
responses to calcium signals integrated over 400-|1m-diameter windows 
around each parenchymal artery (Fig. 2b, c). The orientation prefer- 
ence of these regions of spiking activity matched those of the arteries 
at their centres (Fig. 2d; R=0.94, n= 19 pairs, P< 10~°). However, 
the orientation selectivity of the spiking activity was higher than the 
corresponding artery in all regions examined (Fig. 2e; P< 107°, paired 
t-test). Because the spacing of penetrating arterioles is heterogeneous 
(see Supplementary Information), we also examined the selectivity of 
neural responses in a wide range of window sizes around each vessel 
(100-600 1m diameter). We found that for all window sizes the spiking 


2 | NATURE | VOL 000 | 00 MONTH 2016 


activity OSI was still at least 60% higher than the vessel dilation OSI 
(Fig. 2f P< 107° at each window size). 

Our calcium imaging results suggest that additional sources of 
neural activity (besides spiking in the local tissue around the vessel) 
may be contributing to sensory-evoked vasodilation. Experimental 
and theoretical work has implicated synaptic glutamate release as 
a driver of haemodynamic responses!®. In particular, calculation of 
the energy budget of the neocortex estimated that, of all the cellular 
processes performed, excitatory synaptic activity has the largest met- 
abolic demand”, Therefore, to measure directly excitatory synaptic 
activity over different spatial scales and compare it to single-vessel 
responses, we labelled neurons in the cat visual cortex with a glu- 
tamate sensor (iGluSnFR; see Methods). We found that glutamate 
activity (like neuronal spiking) is organized in direction and orien- 
tation maps (Fig. 3 and Extended Data Fig. 1). However, glutamate 
signals were generally less orientation selective than spiking activity. 
Integrating over 400-j1m-diameter windows, the OSI for calcium 
(mean + s.e.m.=0.59 + 0.02; n= 19 regions) was sharper than that 
for glutamate (mean +s.e.m.=0.44 + 0.02; n = 37 regions; P< 0.001, 
Mann-Whitney test). To determine if excitatory synaptic activity 
alone could account for single-vessel haemodynamic responses, we 
integrated the glutamate signals over 400-,1m-diameter regions around 
individual arteries (Fig. 3a, b) and compared these to vessel responses 
from the same sites (Fig. 3b-e). We found that the visual stimulus 
that produced the largest glutamate signal in a 400 1m window 
matched the visual stimulus that resulted in the largest vessel dilation 
(Fig. 3c; R= 0.90, n = 37 pairs, P< 10~). However, a 400-1m region 
of synaptic activity was always more selective than its correspond- 
ing penetrating arteriole (Fig. 3d; P< 107", paired t-test). The mis- 
match between orientation selectivity in individual blood vessels and 
synaptic activity was confirmed for a range of glutamate response 
window sizes (100-600 1m diameter; Fig. 3e; P< 10~° at each 
window size). 

The broader tuning of the vascular response relative to synaptic 
and spiking activity (Figs 2 and 3) suggests that vessels can respond 
to sensory stimuli that evoke little to no concomitant neural activity 
in the surrounding tissue. This phenomenon can be directly observed 
by comparing the response amplitudes of vessel dilation and neural 
activity to individual stimulus conditions. Extended Data Figure 2a, b 
shows an example in which two visual stimuli (135° and 180°) evoked 
robust dilations in a penetrating arteriole but essentially no glutamate 
release in the region around the blood vessel. Across the data sets 
of synaptic and spiking activity, we compared the amplitude of each 
vessel's response to each sensory stimulus against the neural response 
around the vessel to the same stimulus (Extended Data Fig. 2c). Our 
analysis confirmed that there are many instances where there is a 
non-existent (or very small) synaptic or spiking response to a visual 
stimulus despite a robust dilation response. In general, there are very 
few instances where a stimulus failed to evoke a dilation response of 
some magnitude. 

Like orientation tuning, direction selectivity is a hallmark feature of 
the primary visual cortex and represents the capacity of a neuron to 
respond preferentially to one direction of stimulus motion at the opti- 
mal stimulus orientation. We found direction maps for excitatory syn- 
aptic activity (Fig. 3) that were qualitatively similar to direction maps of 
spiking activity (Fig. 2). However, the directionality index!® (DI) over 
400-\1m-diameter windows was greater for spiking activity than for syn- 
aptic responses (DI spiking mean +s.e.m. = 0.59 + 0.07; n= 19 regions; 
DI synaptic mean + s.e.m. = 0.33 + 0.03; n = 37 regions; P< 0.01; 
Extended Data Fig. 3a). Blood vessel responses appeared to have little 
direction selectivity, even when surrounded by iso-direction territories 
of spiking activity, for example, Fig. 2b vessels 2 and 4. Indeed, across 
the population, the direction selectivity of vessels was smaller than that 
of regions of spiking activity (DI vessel mean + s.e.m. = 0.30 £0.02; 
n=79 vessels; P< 0.0005). The population mean DI of vessels and syn- 
aptic activity were similar (Extended Data Fig. 3a; P=0.70) although 
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Figure 2 | Stimulus selectivity of single vessels and of spiking activity 

in the surrounding tissue. a, In vivo anatomical image of a small region 

of layer 2/3 cat visual cortex labelled with OGB-1 AM (green) and an 
arteriole labelled with Alexa 633 (red). Polar plots show the sensory 
evoked calcium responses from six neurons and the dilation of the 
arteriole. b, Pixel-based direction map and polar plots of responses from 
another cat labelled with OGB-1 AM and Alexa 633. The pixels are colour- 
coded by their preferred stimulus, with the brightness indicating the 
response strength. Red polar plots show the dilation responses of the five 
vessels indicated by solid white circles on the direction map. Green polar 
plots show the calcium responses pooled in 400-j1m-diameter windows 
around four of these vessels (dashed circles). No calcium responses are 
shown for the region around vessel number 1 because this vessel was near 
the edge of the imaging field. c, Direction map and polar plots of responses 
from cat visual cortex labelled with GCaMP6s and blood vessels labelled 
with Texas Red Dextran. Left, tiled in vivo anatomical images of a large 
region of layer 2/3 where the positions of seven penetrating arterioles are 
numbered. Right, pixel-based direction maps of neural responses. The 
polar plots at the bottom show the vessel dilation responses (red) and the 
calcium responses (pooled over 400-j1m-diameter regions) around five 

of these vessels (green). d, Correlation between the preferred orientation 
of the vessel dilation and the preferred orientation of calcium responses 

in 400-\1m-diameter windows around each vessel (R = 0.94; P< 107°; 

n= 19 windows in 8 cats; regression line shown in red). e, No significant 
correlation between calcium and vessel OSI (R= 0.41; P= 0.08). f, 
Distribution of OSI for calcium responses across tissue regions of different 
window sizes (n = 11 cats) and for the population of dilation responses 
(n= 18 cats). Solid bars are medians and boxes indicate the interquartile 
range. For all window sizes, calcium responses were more selective than 
the vessel dilation (P < 0.0001, Mann-Whitney test). Pixel maps shown are 
averages of five to six trials. Scale bars, 25 j1m (a) and 100\1m (b, c). 


there was no correlation between the direction selectivity of a particular 
vessel and the glutamate signals in the surrounding tissue (Extended 
Data Fig. 3b; R=0.20; n = 37 pairs; P=0.23). 

While vessel dilation responses over the population of parenchymal 
arterioles did not match neural orientation selectivity, we tested the 
possibility that the smallest vessels would show similar selectivity to 
neural responses. Larger penetrating arterioles (with baseline diameter 
>15m) may perfuse larger regions of tissue than small penetrating 
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arterioles and their finer branches. Therefore, these smaller vessels 
(typically 8-15-\1m baseline diameter) may be sensitive to vasodilators 
from smaller regions of neural tissue and thus have sharper orientation 
tuning. Indeed we found that OSI was inversely correlated with base- 
line vessel diameter in cat layer 2/3 (Extended Data Fig. 4a; R= 0.37, 
P<0.001). We compared the dilation responses of these small vessels 
(baseline diameter <15,1m, mean = 11.8 |1m) to those with baseline 
diameter >15 1m (based on consistency of Alexa 633 labelling’; see 
Methods). We found that the small vessels were slightly more tuned 
(OSI mean +s.e.m. = 0.24 + 0.02; n = 35) than the larger ones (OSI 
mean +s.e.m.=0.18 £0.01; n=44; P< 0.05; Extended Data Fig. 4a, b). 
Importantly, however, the OSI of these small vessels was still lower 
than synaptic and spiking activity over the full range of window sizes 
(P< 0.005; see Extended Data Fig. 4b, c). 

Capillaries are the smallest vessels in the neocortex and therefore 
may be even more tuned for stimulus orientation than small arteri- 
oles. However, whether capillaries have the capacity to dilate in vivo 
to sensory stimuli is controversial!*!*+!®!9_ This is probably due to 
inconsistent criteria for defining capillaries and distinguishing them 
from pre-capillary arterioles as well as to the difficulty of detecting 
dilation in very small vessels even with two-photon microscopy 
resolution!?!+!8!9 However, a small dilation in a capillary that is 
undetectable with two-photon microscopy would still lead to easily 
detectable changes in red blood cell (RBC) velocity. Because RBC 
size is slightly larger than the capillary lumen diameter, a very small 
dilation in a capillary could produce a dramatic reduction in the 
resistance to flow (see figure 2e in ref. 20). Therefore, we measured 
the stimulus-evoked changes in RBC velocity in a set of micro-vessels 
that would probably be classified as capillaries based on their high 
tortuosity and small diameter (4-7 jum; see Methods)*!”*. We found 
that the orientation selectivity on the basis of blood velocity in these 
capillaries (OSI mean £s.e.m. = 0.30 + 0.04; n = 15 vessels) was no 
different from what was found for dilation of the <15-\1m-diameter 
vessels (P= 0.16; Extended Data Fig. 4a, b). To determine if the tun- 
ing of capillaries was due to these vessels being in unusually broadly 
tuned windows of neural activity, we compared the OSI of neural 
activity around capillaries with what was found around parenchymal 
arterioles. The OSIs of 400-j1m-diameter windows of spiking activity 
around capillaries (mean + s.e.m. =0.60 + 0.03; n = 13) and around 
arterioles (mean ts.e.m. = 0.59 + 0.02; n= 19) were indistinguishable 
(P=0.94, Mann-Whitney test). Thus, stimulus-evoked changes even 
in capillaries were still not as selective as the responses in adjacent 
neural tissue. 

Our results suggest that blood flow increases in parenchymal vessels 
are partially driven by local neural activity (which would generate 
the match in orientation preference) and also by an additional global 
component arising from adjacent functional columns (which would 
induce the dilation to non-preferred orientations). One possibility 
is that this global component is due to the propagation of dilatory 
signals along vessel walls. Specifically, the lack of orientation tuning 
in surface arteries could result from the dilation of penetrating arte- 
rioles from many different orientation domains propagating back to 
a surface artery. Then the propagation of dilation along the surface 
artery and down into multiple penetrating arterioles could broaden 
the locally driven dilation signal, leading to dilation in adjacent regions 
of tissue that have no concomitant neural activity. Previous studies 
in rodents have demonstrated the propagation of dilation from the 
parenchyma up to the cortical surface** and along the surface over 
distances of at least 1 mm (refs 24-26) at rapid speeds”®, but these 
have not been linked to the stimulus selectivity of vessel or neural 
responses. Consistent with this propagation of dilation hypothesis, 
we found that in cat visual cortex parenchymal vessels dilate before 
the surface vessels and that the dilation to the preferred orientation 
came before the dilation to the stimulus that was oriented orthogo- 
nal to the preferred orientation, that is, at the null orientation (see 
Supplementary Information and Extended Data Fig. 5). Alternative 
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Figure 3 | Stimulus selectivity of single vessels and of excitatory synaptic 
activity in the surrounding tissue. a, Bright-field image of the surface 

of cat visual cortex showing the location of six penetrating arterioles and 
the regions targeted for two-photon imaging. b, Direction maps and polar 
plots of glutamate responses from cortical neurons 

labelled with iGluSnFR and dilation from blood vessels labelled with 
Texas Red Dextran. The positions of the arterioles and the 400-j1m- 
diameter windows of pooled glutamate responses in cortical layer 2/3 are 
indicated by solid white and dashed circles, respectively. Red polar plots 
show the dilation responses and blue plots show the pooled glutamate 
activity in the windows around each arteriole. c, Correlation between 

the preferred orientation of the dilation responses and the preferred 
orientation of glutamate activity in 400-j1m-diameter windows around 
each vessel for all cat data (R=0.90, P< 10~°; n=37 windows in 5 cats). 
d, No significant correlation between glutamate and vessel OSI (R=0.28, 
P=0.08; linear regression line shown in red). e, Distribution of OSI for 
glutamate activity across windows of different sizes (n =5 cats) and for the 
population of dilation responses (n = 18 cats). Solid bars are medians and 
boxes indicate the interquartile range. For all window sizes, the glutamate 
responses were more selective than vessel dilation (P< 107°, Mann- 
Whitney test). Pixel maps shown are averages of 8-10 trials. Scale bars, 
500 «zm (a) and 100,1m (b). 


hypotheses on the origin of the selectivity of vessel dilation are 
discussed in Supplementary Information. 

In conclusion, our results have a number of implications for the 
interpretation of haemodynamic signals in relation to neural activ- 
ity. We provide direct single-vessel evidence for the untuned global 
signal in the pial vasculature that has been found in low-resolution 
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haemodynamic imaging studies”””*. After subtracting the global 
signal, these earlier studies often suggested that the residual tuned 
vascular responses were of capillary origin’”**. We show that indi- 
vidual penetrating arteries also display stimulus-specific responses. 
Furthermore, the orientation selectivity of these parenchymal vessels 
is an order of magnitude higher than what is obtained with intrinsic 
signal optical imaging”’. We also demonstrate that an organized func- 
tional map of neural responses is required for attaining tuned haemo- 
dynamic signals (see also ref. 30). Furthermore, by sampling responses 
over the full range ofa stimulus parameter and by directly measuring 
synaptic and spiking activity along with single-vessel responses in 
precisely defined spatial regions of tissue, we overcome many of the 
technical limitations of earlier studies that examined neurovascular 
coupling. The difficulties inherent in correlating low-resolution vas- 
cular signals with electrophysiological metrics of neural activity and in 
interpreting glutamate pharmacology has led to controversy regarding 
the spatial scale over which synaptic versus spiking activity matches 
vascular signals (see Supplementary Information). Here we establish 
that the sensory stimulus that elicits the largest synaptic or spiking 
response also produces the largest haemodynamic signal. However, 
the complete selectivity profile of neither synaptic nor spiking activ- 
ity in the local tissue around a vessel can be inferred from the tuning 
curves of haemodynamic signals. Thus, vascular signals are partially 
decoupled with local neural signals, over distances of at least 300 jim. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Animals and surgery. All surgical and experimental procedures were approved 
by the Institutional Animal Care and Use Committee at Medical University of 
South Carolina. Cats (n= 25 of either sex; postnatal day 28 to >2.5 kg adult) were 
anaesthetized with isoflurane (1-2% during surgery, 0.5-1.0% during imaging) and 
paralyzed with a continuous infusion of vecuronium bromide (0.2mgkg-'h~!, 
intravenously). Cats were artificially ventilated through a tracheal cannula, and the 
end tidal CO, was regulated at 3.5-4.5%. Heart rate, respiration rate, temperature 
and electroencephalogram were also monitored. Long Evans rats (n= 10 males, 
postnatal days 31-45) and C57BL/6J mice (n = 1 male, postnatal day 63) were 
initially anaesthetized with a bolus infusion of fentanyl citrate (0.04-0.06 mgkg 1), 
midazolam (3.75-6.25 mg kg !), and dexmedetomidine (0.19-0.31 mgkg”!). The 
one mouse was used for a control experiment to confirm that the iGluSnFR sensor 
was not being saturated during sensory stimulation (see later). During two-photon 
imaging, continuous intraperitoneal infusion with a lower concentration mixture 
(fentanyl citrate: 0.02-0.03 mgkg'h~}; midazolam: 1.50-2.50 mgkg-'h~}; and 
dexmedetomidine: 0.10-0.25 mgkg 'h~!) was administered using a catheter 
connected to a syringe pump. 

For all animals, craniotomies (2-3 mm square) were opened over the primary 
visual cortex (area 18), the dura was reflected, and the craniotomies were sealed 
with agarose (1.5-3% dissolved in artificial cerebrospinal fluid (ACSF)) and a glass 
coverslip. When the calcium indicator OGB-1 AM was used, before the place- 
ment of the coverglass, a pipette was inserted into the craniotomy and the dye was 
injected with air pressure puffs. The dye loading procedure has been described 
in detail?!. 

In cats, we also used the genetically encoded indicators GCaMPé6s* and 
iGluSnFR* to measure calcium and glutamate activity respectively. Two-to-four 
weeks before the imaging session, viral injections of AAV2/9.hSyn.GCaMP6s. 
WPRE.SV40 or AAV2/1.hSyn.iGluSnFR.WPRE.SV40 were performed under 
sterile surgery conditions. Cats were anaesthetized with 1-2% isoflurane and vital 
signs were monitored. One to three craniotomies were performed over the pri- 
mary visual cortex (area 18) and small holes were made in the dura. Aliquots of 
virus (5 ul) were diluted in PBS and mannitol (5:9:6 ratio of virus:PBS:mannitol) 
to titres of ~10'* genomes ml! with 50-200 nl of Fast Green dye (Sigma) added 
to visualize the injection. Glass pipettes containing the virus solution were lowered 
500-800 1m into the cortex and pressure puffs were administered over 15-20 min 
until approximately 1 jl had been injected. After 10 min, the pipettes were slowly 
retracted, the craniotomies were sealed with agarose (3% dissolved in ACSF), the 
scalp was sutured closed and the animals were recovered and returned to their 
housing. All animals were treated similarly and so randomization and blinding 
were not required. No statistical methods were used to predetermine sample size. 
Two-photon imaging. For vascular imaging, three fluorescent dyes were used 
as we described previously". Alexa 633 fluor hydrazide selectively labels artery 
walls, while Texas Red dextran (70 kDa) and fluorescein dextran (2,000 kDa) 
indiscriminately label the entire vascular lumen. Fluorescein dextran has similar 
excitation and emission properties as our neuronal labels OGB-1 AM, GCaMP6s, 
and iGluSnFR. Therefore, fluorescein dextran was not used for vessel dilation 
measurements in animals where neuronal imaging was performed because suf- 
ficient contrast between a vessel wall and background is more difficult to obtain 
when two green labels are used simultaneously. Fluorescein dextran was typically 
used for the measurement of RBC velocity and was only injected after the neural 
imaging was completed. 

Fluorescence was monitored with a custom-built microscope (Prairie 
Technologies) coupled with a Mai Tai (Newport Spectra-Physics) mode-locked 
Ti:sapphire laser (810 nm or 920 nm) with DeepSee dispersion compensation. 
Excitation light was focused by a x 40 (NA 0.8, Olympus), x20 (NA 1.0, Olympus) 
or X16 (NA 0.8, Nikon) water immersion objective and beam expansion optics. 
Full frame imaging of neural activity and vessel dilation were typically obtained at 
approximately 0.8 Hz. All the blood velocity data and the dilation of a small number 
of vessels were measured with line scans rather than full frame imaging by using 
line acquisition rates between 0.4 and 4.2 kHz. 

Visual stimulation and size of imaged region. Drifting square-wave grating stimuli 
were presented on a 17-inch LCD monitor. The gratings were presented at 100% con- 
trast, 30cdm~? mean luminance, 1.5-2.0 Hz temporal frequency. As depicted in the 
various time courses (for example, Fig. 1) these stimuli were presented at eight direc- 
tions of motion in 45° steps and each of these eight stimuli was interspersed with blank 
periods (equiluminant grey screen). Because vascular responses decay slowly, we used 
long blank periods (at least four times the stimulus duration) when measuring blood 
vessel responses. We also presented the eight visual stimuli in pseudo-random order. 
These steps ensured that a particular response would not be influenced by a residual 
response to the previous stimulus. The duration of the stimulation period, for exam- 
ple, 6s, and the duration of the blank period, for example, 24s, was always identical 
across all epochs in a stimulus sequence. Each of the eight stimuli was repeated at least 


three times and in the vast majority of the data, 5-10 trials were used. Unlike arteriole 
dilation, neural transients return to baseline nearly immediately upon extinguishing 
the visual stimulus!” (see also Extended Data Figs 1 and 2). Therefore, for epochs 
of data collection involving only calcium or glutamate imaging, either sequential or 
pseudo-random sequences were used. While some neural and vascular data were 
collected simultaneously, our analyses benefited from collecting them sequentially 
for the following reasons. Neural data was typically collected in 600 x 600|1m square 
regions to allow the pooling of large regions of activity (see Figs 2b, c and 3b) and 
multiple 600 x 600 1m square regions were often imaged in a single craniotomy to 
obtain maps where many orientation and direction domains were represented (see 
Figs 2b, c, 3b and Extended Data Fig. 1). Higher pixel resolution was needed for 
resolving blood vessel dilation so we usually obtained the vessel responses immedi- 
ately after the neural responses using higher optical zooms that were centred on the 
blood vessels of interest. Because of the optical zoom customization per imaged site 
and the differences in recovery of neural versus vascular responses to baseline, the 
selected duration of visual stimulation for a particular experiment ranged from 2 to 
8s and the duration of blank periods ranged from 6 to 35s. 

Data-analysis overview. Images were analysed in Matlab (Mathworks) and ImageJ 
(National Institutes of Health). Data with movements of several |1m in XY or Z 
were excluded. Data with small drift movements were realigned by maximizing 
the correlation between frames. 

Quantifying vessel dilation. We analysed dilation responses only in surface arter- 
ies and penetrating arterioles because veins and capillaries do not typically dilate to 
sensory stimuli less than 10s in duration'?"4, When veins dilate in response to very 
long duration sensory stimuli, these responses are relatively weak and extremely 
slow, unlike the rapid and large responses of arteries and arterioles!’. Surface arter- 
ies and penetrating arterioles are distinguished from veins using Alexa 633 as an 
artery-specific dye, by their orange versus purplish hue under bright-field illumi- 
nation, and by the tone of the vessel walls and the speed and direction of blood 
flow during two-photon imaging”. Distinguishing capillaries from pre-capillary 
arterioles has been inconsistent in the literature!*!*1*°, Here we categorize cap- 
illaries as vessels with 4-7 |.m baseline diameter’, high tortuosity and complete 
lack of Alexa-633 labelling!”. Vessel diameter was determined in full-frame images 
by one of two methods. When the vessel had a circular profile (as was the case 
for most of the parenchymal arterioles), a region of interest was manually drawn 
around each vessel and a circle was fit to the pixels in the region that passed a 
luminance threshold (Extended Data Fig. 6). For vessels with an elongated profile 
(typical for pial surface vessels), a cross-section was taken through the vessel walls 
and the peaks in luminance (for the wall-labelling Alexa 633) or peaks in the 
pixel-by-pixel luminance difference along the line (for lumen labels) were used to 
compute the diameter’? (see Extended Data Fig. 7). For the few instances where 
diameter was measured using line-scans, we averaged the data along the time axis 
over all the lines in an image (usually 1,000 lines) or obtained two data points per 
image by sequentially averaging half the lines in each image. Since each line was 
only 0.25-2.5 ms in duration, averaging across these lines still provided sufficient 
temporal resolution for capturing the onset, peak and recovery of sensory-evoked 
dilation. The diameter was computed from these line scans in the same way as for 
the cross-section (Extended Data Fig. 8). For all methods of dilation measurement, 
images were usually oversampled by interpolating between pixels from 2 to 20 
times to allow the algorithm to compute diameter values with a spatial resolution 
that was finer than the pixel size in the raw data images. To compute the vascular 
response to each condition, a stimulus response window was defined. Because of 
the slow onset and offset of the vascular response, we could not simply assign the 
response period to correspond to the period when the stimulus was displayed on 
the monitor. Instead, for each vessel we selected the response period by examining 
the average response across all stimulus conditions and then selected the imaging 
frames that best approximated this response interval. Shifting this time window 
by adding or removing data points did not appreciably change the responses. 
The mean response across this time window was divided by the baseline level 
for each condition to get the percentage change in diameter. Responsive vessels 
were defined by ANOVA across baseline and eight directions over multiple trials 
(P<0.05). The orientation selectivity index (OSI) was defined as: 


> rpei2k 
OSI = abs| -*—___—_ 
rk 
k 


where 6 is the orientation of each stimulus and 7; is the mean response across trials 
to that stimulus*>. Note that OSI = 1 — circular variance. The preferred orientation 
was defined as: arctan(Ur, cos(20;)/Sir;, sin(29,)). The directionality index (DI) 
was computed as 1 — rputi/Tprep Where rpref is the response amplitude to the pre- 
ferred stimulus and ry is the response to the stimulus with the same orientation 
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drifting in the opposite direction. Computing the OSI based on flow rather than 
diameter by scaling the diameter values to the 4th power (Poiseuille’s law) did not 
affect our results. 

Measuring onset latency to dilation. To compare the latencies of pial arteries 
and parenchymal arterioles, we fit a linear regression line to the rising phase of 
the dilation (20-80% of the peak response) of each vessel. For parenchymal ves- 
sels, we used only the response to the preferred orientation because of potential 
latency differences between dilation to the preferred and other stimulus orienta- 
tions (see later). For the pial vessels, we pooled the response to all stimulus condi- 
tions because these vessels are untuned to stimulus orientation. We used the time 
at which the regression line crossed the pre-stimulus baseline level as the onset 
latency’. This regression line metric on the average response is applicable when 
responses are large and relatively stable from trial to trial—as is the case for pial 
vessels to any stimulus orientation and for parenchymal vessels to the preferred 
stimulus orientation. 

Since parenchymal vessels are orientation selective (Fig. 1b, d), responses to 
the null orientation are the weakest and, by definition, smallest in amplitude and 
more noisy from trial to trial. Thus, to compare the latency between the response 
to the preferred and null orientations in parenchymal vessels, we used a statistical 
test, the standardized mean difference (SMD, specifically Hedge’s g; refs 36, 37), 
in which vessels are weighted by the trial-by-trial variance in latency values (see 
Extended Data Fig. 5a). We first smoothed each trial’s time course with a three- 
frame running average. We then performed linear regression on the same interval 
as earlier (20-80% of the peak). We took the difference in the average onset latency 
across trials between the responses to preferred and null stimuli and standardized 
this difference by the pooled variability across the two conditions. The population 
summary SMD was obtained by using a random-effect model. This model weighs 
each vessel by the inverse variance of its SMD and factors in the heterogeneity 
present across the individual vessel data*’. As a control for spurious effects, the 
preferred and null responses were assigned randomly for each trial and the analysis 
was repeated (Extended Data Fig. 5b). 

Quantifying blood velocity. Velocity data was analysed as described previously”. 
Briefly, line scans were first pooled into blocks of 250, 500, or 1,000 lines. The angle 
of the RBC streaks in each image was used to determine the velocity of that block 
and a time course of velocity measurements was extracted. Baseline and stimu- 
lus windows were defined similarly to the dilation data and equivalent OSI and 
statistical analyses were performed. 

Analysis of calcium and glutamate responses. Calcium and glutamate signals 
were analysed the same way. Raw images were first smoothed with a 4\1m Gaussian 
filter. The mean fluorescence of each pixel within a given 100-600-|1m-diameter 
window around a vessel was computed for each blank and stimulus epoch. A f-test 
was performed on the difference in stimulus and baseline fluorescence for each 
condition in each trial and if the distribution was significantly higher than zero 
(P< 0.05), the pixel was included in the integration window. We also performed 
this analysis without excluding the unresponsive pixels and the responses were 
indistinguishable. In some data sets, part of the 100-600-|1m-diameter analysis 
window fell outside of the image boundary and so there would be fewer pixels 
from those domains contributing to the overall response. Therefore, to avoid 
biasing the overall response of the integrated region, we divided the 100-600-|1m- 
diameter analysis window into wedges before averaging the data over the full 
window. Each wedge was 1/16 of the circle and was further divided into sections 
of 50,1m radial length. Thus, a 100-j1m-diameter window had 16 sections whereas 
a 400 jum-diameter window had 64. The pixels with significant responses within 
each section were averaged together to create a time course. The time course was 
then normalized by a sliding baseline of the mean fluorescence of each blank 
interval (AF/F). Each time course was then weighted by the total number of 
pixels represented by its section, because sections farther from the vessel contain 
more pixels. Finally, the time courses of all the sections were averaged together to 
obtain the time course of the entire region. For inclusion in the population data 
set, responses from the 100-600-|1m-diameter analysis windows had to pass the 
following criteria. First, each wedge had to have at least 30% of the imaged pixels 
passing the initial t-test to ensure that windows with wedges having no response 
and/or weak labelling would be removed. Second, at least 80% of the circular area 
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of the window had to be within the image to ensure that a sizeable region of tissue 
whose orientation preference could dramatically affect the overall response was 
not being missed. In addition, at least 10% of each wedge had to be within the 
image to ensure that each wedge had some representation. For data that passed 
all these criteria, the responses to each condition were computed by averaging 
the imaging frames during stimulus presentation and across trials. Before the 
OSI was computed, if any conditions showed a negative response (below the 
baseline level), then the absolute value of the minimum response was added to 
all responses (to make the minimum equal zero). We have recently published a 
mechanistic rationale for applying such a correction in fluorescence imaging of 
neural responses—stimulus-evoked dilation of surface arteries can block fluores- 
cence from the underlying tissue and make a very small response actually appear 
negative!?. We also analysed the data without this correction and, in addition, 
when only including the first 1 s of the response (to avoid the slower surface artery 
interference’”). Although there were small changes in the OSI values of individual 
windows, the overall results did not change in either case. The neural response 
amplitudes, OSI and DI were all computed using the same formulae as for the 
vessel data. Population distribution statistics on OSI and DI measurements used 
the Mann-Whitney test. 

Additional control for calcium imaging. Spiking activity in the neuropil should 
also contribute to metabolic needs and hence neurovascular coupling. Therefore, 
when integrating the calcium signals in the tissue surrounding each artery, we 
included all pixels that passed a signal-to-noise criterion (see earlier) and not 
only those corresponding to cell bodies. However, the neuropil may include a 
mixture of calcium signals from synaptic events in dendritic spines and spiking 
in axons arriving from regions outside of the integration window we selected. 
Therefore, as a control, we compared the orientation selectivity in 400 |1m-diameter 
windows with and without including the neuropil. Masks excluding the neuropil 
were generated in the same way as described earlier except that the pixels within 
each wedge were constrained to the cell bodies. Cell body masks were first cre- 
ated using an automated algorithm that applied a series of morphological filters 
to identify the contours of cell bodies based on intensity, size and shape’°. Cell 
outlines were visually inspected and errors were corrected manually. Then a t-test 
was performed on each pixel of these masks and the wedges were created in the 
same manner as before. Because of the sparse distribution of cell bodies, we did 
not enforce the 30% significantly responding imaged pixels criterion but all other 
criteria applied. The orientation selectivity with the two mask types was indistin- 
guishable (Extended Data Fig. 9). 

Control to show that visual stimulation was not saturating the iGluSnFR sensor. 
With visual stimulation, the glutamate signals peaked at <10% AF/F. To determine 
if the iGluSnFR sensor responded linearly and responded over a greater range 
than that obtained with visual stimulation, we used iontophoresis to apply large 
doses of exogenous glutamate. We lowered a pipette containing 0.5 M glutamate 
into layer 2/3 of the visual cortex of a mouse that was labelled with iGluSnFR. We 
applied a range of currents (10, 20, 40, 60, 80nA) and found that the fluorescence 
signals increased linearly (R > 0.99; P< 0.0001) and peaked at ~60% AF/F (data 
not shown). Thus, our in vivo imaging with iGluSnFR (for example, Extended Data 
Fig. 10) is probably revealing the true spatial profile of glutamate direction maps 
(Fig. 3) and orientation maps (Extended Data Fig. 1). 
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Extended Data Figure 1 | Glutamate release is organized into four trials) are shown for three regions of tissue with different orientation 
orientation maps. a, Region of cat visual cortex labelled with iGluSnFR. preferences. b, Orientation maps of iGluSnFR responses from a different 
Pixels are colour-coded by preferred orientation with the brightness cat. Time courses and polar plots are averages of ten trials. 


indicating the response strength. Time courses and polar plots (averages of 
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Extended Data Figure 2 | Arteriole dilation in the absence of glutamate tissue surrounding it to a single direction of visual stimulation, normalized 


signalling or local spiking. a, Time courses and polar plots of arteriole by the response to the best direction. c, Quantifying the relative amplitude 
dilation (red) and the release of glutamate in a 400-\1m-diameter window of vessel and neural responses across all cat experiments. Top panel shows 
surrounding an arteriole (blue). Averages of eight trials are shown for glutamate versus dilation data (n = 37 windows and vessels in 5 cats) 
vessel dilation and ten trials for glutamate responses. In time courses, and the bottom panel shows calcium versus dilation responses (n = 19 
error bands represent s.e.m. and grey bars represent the periods of visual windows and vessels in 8 cats). Each data point in the scatterplot is as 
stimulation. The responses to the 135° and 180° stimuli (outlined by described in b. The histograms at the top and right show the distributions 
the black box over the time courses) are large for the vessel dilation but of neural and dilation responses, respectively. In both population 

virtually non-existent for the glutamate activity. b, Quantifying the relative _ scatterplots, there are many data points in the top left quadrant, indicating 
amplitude of the vessel and neural responses to each of the eight stimulus stimuli that drove robust dilation responses but minimal glutamate or 
directions for the single cat experiment shown in a. Each data point in the calcium responses. All data are from cat visual cortex layer 2/3. 


scatterplot represents the average response of the vessel and of the neural 
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Extended Data Figure 3 | Direction selectivity of parenchymal vessels Mann-Whitney test). The DI of synaptic activity was not different from 


and of local spiking and synaptic activity. a, Population distributions of the DI of vessel dilation (P=0.70, Mann-Whitney test). Solid bars are 
the direction index of calcium (green, n = 19 windows in 8 cats), glutamate medians and boxes show the interquartile range. b, For each vessel that 


(blue, n = 37 windows in 5 cats) and vessel dilation (red, n = 79 vessels had a corresponding 400-j1m-diameter window of calcium or glutamate 
in 18 cats) responses. All data were obtained from cat visual cortex and activity, the vessel direction index is plotted against the corresponding 
neural responses were pooled over 400-|1m-diameter windows. The DI of neural direction index. There was no significant correlation for calcium 
spiking activity was greater than the DI of synaptic responses (P < 0.01, (R= 0.2, P= 0.43, n= 19 pairs) or glutamate (R= 0.2, P=0.23, n=37 
Mann-Whitney test) and the DI of vessel dilation (P < 0.0005, pairs). 
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Extended Data Figure 4 | Dilation and velocity responses in 
parenchymal blood vessels with different baseline diameters. a, The 
diameter of all vessels and their OSI values from cat visual cortex layer 
2/3. For arterioles, OSI was determined based on dilation (n =79 vessels 
in 18 cats) whereas for capillaries, OSI was calculated from blood velocity 
measurements (n= 15 vessels in 7 cats). b, The distribution of OSI for 
the three subgroups of layer 2/3 vessels analysed in our study (>15 1m, 
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n= 44 vessels in 15 cats; <15j1m, n = 35 vessels in 14 cats; capillaries, 

n= 15 vessels in 7 cats). The OSI of the <15 1m vessels was greater than 
the OSI of the >15 1m vessels (P < 0.05, Mann-Whitney test). The OSI 

of the <15 1m vessels was not different from the OSI of the capillaries 
(P=0.16, Mann-Whitney test). Solid bars are medians and boxes indicate 
the interquartile range. c, The OSI distribution of dilating vessels and 

400 1m-diameter windows of calcium and glutamate responses. 
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Extended Data Figure 5 | Onset latency of dilation in parenchymal 
vessels. a, Vessel-by-vessel comparison of the onset latency difference 
between the response to preferred and orthogonal (null) stimulus 
orientations. Each whisker diagram represents a single vessel with the 
circle position indicating the standardized mean difference (SMD; 
calculated as Hedge’s g) in latency. The whisker length represents the 95% 
confidence interval (CI) of the SMD. The size of the circle represents the 
weight given to the vessel when calculating the population summary SMD. 


The population summary SMD is shown by the solid square with the error 
bands giving the 95% CI. The population average shows that parenchymal 
vessels responded significantly faster for the preferred than the null 
stimulus orientation. b, As a control, the analysis shown in a was repeated 
after randomizing the assignment of preferred and null on individual trials 
for each vessel. All data are from cat visual cortex layer 2/3 (n =79 vessels 
in 18 cats). 
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Extended Data Figure 6 | Dilation measurements with circle fitting. fourth panel but it is overlaid on all the panels for illustration purposes. 
a, The steps of the circle fitting algorithm are illustrated for a blank and a b, As the threshold is increased, fewer pixels pass the threshold and 
stimulus frame corresponding to the penetrating arteriole shown in the therefore the baseline diameter changes. However, the percentage change 
bottom panel of Fig. 1b. The raw image data (first panel) is oversampled in diameter across baseline and stimulus presentations (the response 
by linear interpolation between pixels (second panel). Then a luminance amplitude) and the response selectivity remain the same. Note that for 
threshold (a fraction of the gradient between the brightest and darkest vessel geometries needing an elliptical fit rather than a circular fit (see 
pixel of the image) is applied (third panel). Finally, a two-dimensional Extended Data Fig. 8c and Methods), the shorter axis of the fitted ellipse 
sobel filter is applied to the thresholded pixels to detect the edge of the was used to estimate the vessel diameter. 


vessel (fourth panel). The circle fit is only applied to the pixels in the 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


a 


Perpendicular 
to vessel wall 


Section 1 


O 
O 


OSI1=0.044 


Section 3 


O 
O 


OSI=0.036 


50 um 


Perpendicular 
to vessel wall 


Section 1 


O 
O 


OSI=0.042 


Section 3 


© 
O 


OSI=0.032 


100 um 


Extended Data Figure 7 | Dilation measurements using the cross- 
section algorithm do not depend on the precise location and angle 
of the selected cross-section. a, Example cat pial artery (from Fig. 1c) 
labelled with Texas Red Dextran. b, Another pial artery from a different 
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cat labelled with the artery-specific dye Alexa 633. Both arteries show 
similar tuning for cross-sections drawn >100 1m apart and also drawn 
perpendicular and obliquely relative to the vessel walls. 
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Extended Data Figure 8 | Dilation measurements in small arterioles 
and comparison of dilation measurement techniques. a, A penetrating 
artery (#1, the responses of which are shown in the top panel of Fig. 1b) 
and its daughter branch (#2) in cat layer 2/3 labelled with Texas Red 
Dextran. Red lines indicate the position of the laser scan path across the 
vessels for line-scan diameter measurements. b, Individual line-scans are 
stacked next to each other to create X-time (XT) images. The four large 
rectangular panels are XT images of a blank and stimulus frame for each of 
two vessels shown in a. The small panels to the right are the average across 
the image (~0.96 s) for each of the four frames. The computed diameter 
values are also shown. These images were oversampled by interpolating 
between pixels (by 5 times for vessel 1 and by 20 times for vessel 2) before 
the diameter was calculated. c, The time courses and polar plots of the 


Time (s) 


responses for three different diameter measurements are shown for vessel 
1—as a line-scan, a cross-section from a full-frame imaging run (seven 
trials), and the circle fit from the full-frame imaging run. In this particular 
example we used an ellipse rather than a circle because of the elongation 
of the vessel due to its diving obliquely to the imaging plane. d, Time 
courses of the vessel responses to preferred stimulus orientations for the 
three groups of vessels shown in Extended Data Fig. 4b. The responses 
for each vessel were aligned by stimulus onset and binned in 400-ms bins. 
The population average was then smoothed with a three-frame running 
average. Mean responses in dark colours and light bands indicate s.e.m. 
Note that the similar error bands and temporal profiles indicate that the 
smallest vessels had a similar quality of responses to the larger ones. 
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Extended Data Figure 9 | Comparison of orientation selectivity in 
regions of calcium responses with and without neuropil. a, In vivo 
anatomical image of cells labelled with OGB-1 AM in cat visual cortex and 
selection of two different masks for quantitative analysis of orientation 
selectivity. Left, a 400 j1m-diameter mask comprising soma pixels only. 
Right, a 400 j1m-diameter mask comprising all significantly responding 
pixels (see Methods). b, The time courses of calcium responses computed 


from the two masks. Time courses are averages of five trials, error bands 
represent s.e.m. and grey bars represent the periods of visual stimulation. 
c, For a population of 16 imaged regions (from 7 cats), the OSI was 
computed with the two masks and found to be indistinguishable (cell 
bodies only OSI mean + s.e.m. = 0.46 + 0.04; cell bodies and neuropil OSI 
mean +s.e.m.=0.47 + 0.04; P=0.12, paired t-test). 
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Extended Data Figure 10 | Orientation-selective responses in layer 1 b, Region of cat visual cortex labelled with iGluSnFR (to measure synaptic 
neurons and synapses. a, Region of cat visual cortex labelled with OGB-1 activity). Again the density of cell bodies (the small black holes) in layer 
AM (to measure spiking activity) and SR101 (to distinguish astrocytes). 1 (left) is much lower than in layer 2/3 (right). The polar plots are the 
Note the much sparser density of neuronal cell bodies in layer 1 (left) responses of a 400-j1m- and 100-j1m-diameter window of layer 1 glutamate 
compared with the higher density of cells deeper in layer 2/3 (right). The activity. 


polar plots are the responses of the two layer 1 neurons labelled in the image. 
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Towards clinical application of pronuclear transfer 
to prevent mitochondrial DNA disease 


Louise A. Hyslop!?, Paul Blakeley*, Lyndsey Craven‘, Jessica Richardson!, Norah M. E. Fogarty°, Elpida Fragouli®, Mahdi Lamb!, 
Sissy E. Wamaitha?, Nilendran Prathalingam!”, Qi Zhang!+, Hannah O’Keefe!, Yuko Takedal, Lucia Arizzi!*, Samer Alfarawati’, 
Helen A. Tuppen*, Laura Irving}, Dimitrios Kalleas!+, Meenakshi Choudhary’, Dagan Wells®, Alison P. Murdoch?, 


Douglass M. Turnbull*, Kathy K. Niakan? & Mary Herbert? 


Mitochondrial DNA (mtDNA) mutations are maternally inherited 
and are associated with a broad range of debilitating and fatal 
diseases'. Reproductive technologies designed to uncouple the 
inheritance of mtDNA from nuclear DNA may enable affected 
women to have a genetically related child with a greatly reduced 
risk of mtDNA disease. Here we report the first preclinical studies 
on pronuclear transplantation (PNT). Surprisingly, techniques 
used in proof-of-concept studies involving abnormally fertilized 
human zygotes? were not well tolerated by normally fertilized 
zygotes. We have therefore developed an alternative approach 
based on transplanting pronuclei shortly after completion of 
meiosis rather than shortly before the first mitotic division. 
This promotes efficient development to the blastocyst stage with 
no detectable effect on aneuploidy or gene expression. After 
optimization, mtDNA carryover was reduced to <2% in the 
majority (79%) of PNT blastocysts. The importance of reducing 
carryover to the lowest possible levels is highlighted by a progressive 
increase in heteroplasmy in a stem cell line derived from a PNT 
blastocyst with 4% mtDNA carryover. We conclude that PNT has 
the potential to reduce the risk of mtDNA disease, but it may not 
guarantee prevention. 

Predicting the risk of serious disease in children of women who 
carry mtDNA mutations is complicated by a number of factors. 
Mutations in mtDNA can be either homoplasmic (all copies of mtDNA 
are mutated) or heteroplasmic (mixture of mutated and wild-type 
mtDNA). In the case of heteroplasmy, women produce oocytes with 
widely varying mutation loads*. While pathogenicity is generally pro- 
portional to the ratio of mutated to wild-type mtDNA, the severity of 
disease for a given mutation load can vary, even among homoplasmic 
individuals*. The resulting unpredictability in the risk of transmitting 
disease raises profoundly difficult reproductive decisions for women 
from affected families. While preimplantation genetic diagnosis 
(PGD) can be used to reduce the risk of mtDNA disease by identify- 
ing embryos with low mutation loads*, it is not useful for women who 
are homoplasmic for pathogenic mtDNA mutations®. In such cases, it 
may be possible to reduce the risk of transmission by transplanting the 
oocyte nuclear DNA to an enucleated donor oocyte free of pathogenic 
mtDNA mutations. 

Progression through female meiosis offers a number of opportu- 
nities for transplanting nuclear DNA. Proof-of-concept studies”® 
indicate that transplantation of the nuclear genome between human 
oocytes arrested at metaphase of meiosis II (MII) is associated with 
a high incidence of abnormal fertilization’. An alternative approach 


is to transplant the nuclear genome after fertilization, when the hap- 
loid maternal and paternal genomes are separately packaged in large, 
clearly visible pronuclei. First performed in mouse zygotes more than 
three decades ago’, PNT is typically performed during the G2 phase 
of the 1st mitotic cell cycle. Using this approach, we have previously 
demonstrated that PNT between abnormally fertilized human zygotes 
is technically feasible”. However, their limited capacity for onward 
development has been a major barrier to further investigation of the 
therapeutic potential of PNT. 

Here we investigate the effect of PNT on normally fertilized human 
zygotes. We found that the procedures (Extended Data Fig. 1a, b) 
previously used for abnormally fertilized zygotes” resulted in reduced 
survival. Because developmental competence is correlated with accel- 
erated division to the two-cell stage!°, we asked whether the timing of 
PNT might be too close to the onset of 1st mitosis in normally ferti- 
lized zygotes (Fig. 1a). To address this, we undertook a series of exper- 
iments in which the pronuclei were transplanted shortly after they first 
appear (~8h after insemination; Fig. 1b and Supplementary Videos 
1, 2). Initially, we added sucrose to the enucleation medium to facil- 
itate enucleation and fusion by inducing shrinkage of the cytoplasm 
(Fig. 1b). However, this was later abandoned to reduce the karyoplast 
mtDNA content and had minimal effect on survival (see later). Our 
data indicate that early PNT (ePNT) promotes survival (92% versus 
59% for late PNT (ItPNT); P < 0.01; Fig. 1c). Moreover, ePNT zygotes 
showed normal pronuclei abuttal and division to the two-cell stage 
(Extended Data Fig. 1c, d), indicating that sperm centriole function 
was not disrupted'!. 

Blastocyst formation, which is essential for implantation, occurs at 
5-6 days after fertilization in vitro, and is marked by allocation of cells 
to the inner cell mass (ICM), or to an outer layer of trophectoderm 
cells'”. The morphology of the ICM and trophectoderm correlates well 
with implantation and is used to assess blastocyst quality in clinical 
in vitro fertilization (IVF) programmes (Extended Data Fig. 2a—d). 
While the increased survival of ePNT zygotes (series I) resulted in 
improved blastocyst formation compared with ItPNT, both approaches 
produced few good quality blastocysts (Extended Data Fig. 2e, f). 
Control experiments in which pronuclei were replaced in the same 
zygote (autologous ePNT) indicated that blastocyst quality was com- 
promised by the manipulations (Fig. 2a and Extended Data Fig. 2f). 
To address this, we modified the manipulation medium, removing 
Ca** and Mg** and reducing by tenfold the concentration of the 
fusogen, haemagglutinating virus of Japan envelope (HVJ-E)*. In addi- 
tion, we switched from a two-step to a single-step culture medium, 
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Hill, London NW7 1AA, UK. 4Wellcome Trust Centre for Mitochondrial Research, Institute of Neuroscience, Newcastle University, The Medical School, Framlington Place, Newcastle upon Tyne NE2 
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Figure 1 | Early PNT promotes survival of normally fertilized zygotes 
after PNT. a, Progression from MII arrest to completion of the 1st mitosis 
showing timings of ePNT and ItPNT. ICSI, intracytoplasmic sperm 
injection. b, Images show the steps involved in ePNT. Left, arrowheads 
indicate the pronuclei (PN). Middle, enucleation pipette inserted through 
a laser-induced opening in the zona pellucida (arrow, bottom). Bottom, 
enucleated zygote (cytoplast). Inset shows two karyoplasts, each consisting 
of a single pronucleus surrounded by a small amount of cytoplasm. Right, 
karyoplasts treated with HVJ-E and inserted under the zona pellucida. 
Bottom, arrow indicates removal of excess cytoplasm (see Supplementary 
Video 2). c, Survival of reconstituted ePNT and ItPNT zygotes (P < 0.01). 
Comparisons by \’ test. 


in which embryos remained for the duration of culture. Under these 
conditions (ePNT series I), blastocyst formation and quality did 
not differ between unmanipulated controls and technical controls 
(Fig. 2a, b). Similarly, heterologous ePNT, which involved recipro- 
cal transfers between zygotes from fresh and vitrified oocytes, had 
no detectable effect on blastocyst quality (Fig. 2b, c). Consistent with 


Autologous { Heterologous 


the improved quality, nuclear counts indicated that ePNT blastocyst 
cell numbers were equivalent to controls (Extended Data Fig. 2g, h). 
However, heterologous ePNT resulted in reduced blastocyst formation 
(Fig. 2b), possibly due to an effect of vitrification at the MII stage’, 
which was not ameliorated by delaying vitrification until after exit 
from MII (Extended Data Fig. 3a-f). 

Analysis of aneuploidy by array-based comparative genomic hybrid- 
ization (array-CGH) indicated that while the majority of poor quality 
ePNT blastocysts were aneuploid for multiple chromosomes (Fig. 2d 
and Extended Data Fig. 4), the overall incidence of aneuploidy was 
comparable between ePNT and control blastocysts, and was similar 
to a reference data set of IVF blastocysts, in which female age was 
matched to the karyoplast donors (Fig. 2e). These data indicate that 
the ePNT procedure does not result in an increased incidence of ane- 
uploid blastocysts. 

We next determined whether ePNT alters the pattern of gene 
expression in human blastocysts by performing RNA sequencing 
(RNA-seq) on single cells microdissected from ePNT and control blas- 
tocysts (Extended Data Fig. 5a, b). For reference, we also included a 
previously published series of unmanipulated blastocysts'4. The ePNT 
blastocysts included in these experiments were generated by fusion of 
cytoplasts and karyoplasts with the same (autologous and homologous 
ePNT), or different (heterologous ePNT), mitochondrial genotypes 
(Extended Data Fig. 5b). 

To test for differences in global gene expression, we performed 
principal component analysis (PCA) on normalized RNA-seq data 
(Extended Data Fig. 5c). We first determined whether PCA is suffi- 
ciently sensitive to detect differences in global gene expression between 
good and poor quality blastocysts. Plotting PC1 against PC2, which 
together account for the largest contributions to variation in global 
gene expression, revealed a high proportion of outliers among samples 
from poor quality blastocysts (Fig. 3a, b). By contrast, samples from 
good quality ePNT blastocysts clustered closely with controls (Fig. 3c). 
To determine whether additional principal components, however 
minor, might distinguish differences between good quality ePNT 
and control blastocysts, we plotted all combinations of the first ten 
principal components. In each combination we found that ePNT 
samples cluster with controls (Extended Data Fig. 6a). Furthermore, 
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we were able to distinguish distinct populations of cells corresponding 
to the three cell lineages of the mammalian blastocyst’ (Extended 
Data Fig. 6b). This was confirmed by t-distributed stochastic neigh- 
bour embedding (t-SNE), a nonlinear method for dimensionality 
reduction (Extended Data Fig. 6c). Consistent with this, unsu- 
pervised hierarchical clustering revealed that ePNT and control 
samples cluster together on the basis of lineage (Extended Data 
Fig. 7a). Together, these findings indicate that single-cell RNA-seq 
reliably detects differences in gene expression, and that global and 
lineage-associated gene expression is indistinguishable between 
control and ePNT blastocysts. 

To address the question of whether ePNT specifically affects expres- 
sion of mtDNA-encoded oxidative phosphorylation (OXPHOS) 
genes, we generated a heatmap after unsupervised hierarchical clus- 
tering. This revealed wide variation in the level of mtDNA OXPHOS 
gene expression within and between ePNT and control blastocysts. 
However, samples from both groups clustered together, irrespective 
of whether the karyoplast and cytoplast contained the same, or differ- 
ent, mitochondrial genomes (Extended Data Fig. 7b). This suggests 
that switching nuclear genomes does not alter mitochondrial gene 
expression. 

On the basis of evidence from a range of pathogenic mutations, the 
probability of developing or transmitting disease is low when muta- 
tion loads are <18% (ref. 16) and <5% (ref. 17), respectively. Thus, 
reducing the contribution of karyoplast mtDNA to <5% has the poten- 
tial to prevent transmission to subsequent generations. The level of 
mtDNA carryover during transplantation of pronuclei was measured 
by pyrosequencing (Extended Data Fig. 8a—c) using clumps of cells 
from day 6 ePNT blastocysts (n = 40) generated by reciprocal transfer 
between zygotes from fresh and vitrified oocytes (Fig. 4a). Despite 
removal of excess cytoplasm before karyoplast fusion (Supplementary 
Video 2 and Fig. 1b), we found that heteroplasmy was >5% in a high 
proportion (28%) of samples enucleated in the presence of sucrose. 
This was significantly reduced by the omission of sucrose (Fig. 4b and 
Extended Data Fig. 8d), probably due to an osmotic effect. Carryover 
of mtDNA was further reduced in blastocysts whose cytoplasts orig- 
inated from freshly harvested rather than vitrified oocytes (Fig. 4c), 
which may be explained by a high incidence of cytoplasmic leakage 
from the latter. Thus, the efficacy of ePNT in preventing mtDNA 
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a, Reciprocal ePNT between zygotes from fresh and vitrified oocytes, 
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experimental groups: FreshCy + sucrose; VitCy + sucrose. b, mtDNA 
carryover in day 6 blastocysts arising from ePNT in the presence or 
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two-sided Mann-Whitney U-test). c, mtDNA carryover in day 6 
blastocysts (FreshCy and VitCy) after ePNT in the presence or absence 

of sucrose; horizontal lines represent medians (P values shown, two-sided 
Mann-Whitney U-test). Stacked graph showing the percentage of samples 
with specified levels of heteroplasmy. Groups with different letters have 
significantly different proportions of samples with undetectable levels 

of carryover (*P < 0.05; **P < 0.01; \’ test). b, c, Data points represent 
means of 2-3 technical replicates; n = number of samples, numbers of 
blastocysts as shown in c. d, mtDNA carryover in PNT-hES cell lines 
(n=5). TE, trophectoderm. Data points represent 2-3 technical replicates. 
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n=number of samples tested for each passage. Source data files are 
available online for b-d. 


disease is likely to be increased by vitrifying patient rather than 
donor oocytes. On the basis of our findings, this approach results in 
<2% heteroplasmy in the majority (79%) of blastocysts and none with 
>5% heteroplasmy (Extended Data Fig. 8e, f). Notably, those with >2% 
heteroplasmy were predicted by technical problems such as leakage 
from the cytoplast or inadequate shearing of cytoplasm from the 
karyoplast. Such factors could be taken into account when selecting 
embryos for use in clinical treatment. 

To assess the potential fate of karyoplast mtDNA under conditions 
in which it can replicate'®, we derived human embryonic stem (hES) 
cell lines (n =5) from ePNT blastocysts (Extended Data Fig. 9a-f). 
While all PNT-hES cell lines showed low levels of heteroplasmy at 
passage 1 (P1), one line (36PNT-hES), derived from a blastocyst with 
4% mtDNA carryover, showed an upward drift with wide variation in 
heteroplasmy between colonies by P12 (Fig. 4d). This was confirmed 
by experiments in which individual colonies were subcloned and 
cultured for multiple passages (Extended Data Fig. 10). Interestingly, 
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the karyoplast and cytoplast donors for 36PNT belonged to the same 
mtDNA haplogroup (Extended Data Fig. 9f), however, we cannot 
exclude the possibility that sequence variants in the karyoplast donor 
mtDNA might have conferred a replicative advantage’®. While the 
biological basis remains to be established, the relevance of the 
increased heteroplasmy to development in vivo is unclear. For 
example, recent reports indicate that pluripotent cells derived from 
heteroplasmic fibroblasts exhibit a bimodal drift towards homoplasmy, 
which is not observed in the parental lines*”?!. Moreover, with the 
exception of one controversial case?3, a number of reports*”®, 
together with our own unpublished data, indicate that the level of 
heteroplasmy in preimplantation embryos mirrors that in babies born 
after PGD. Nonetheless, the finding underscores the importance of 
reducing mtDNA carryover to the lowest possible levels and suggests 
that guaranteed prevention of disease will depend on complete 
elimination of karyoplast mtDNA. 

The work reported here represents a considerable advance towards 
understanding the therapeutic potential of PNT in preventing 
transmission of mtDNA disease. Transplanting the pronuclei shortly 
after completion of meiosis resulted in improved survival. Further 
optimization of enucleation and embryo culture procedures promoted 
development of good quality blastocysts whose gene expression and 
incidence of aneuploidy did not differ from controls. Our findings 
also indicate that vitrification of patient rather than donor oocytes 
will probably minimize mtDNA carryover. This offers the added 
advantage of stockpiling patient oocytes before they become 
susceptible to age-related meiotic aneuploidy”®. Given the low levels 
of mtDNA carryover using optimized procedures, we believe that 
ePNT has the potential to reduce risk of mtDNA disease. However, 
until more is known about the postimplantation fate of karyoplast 
mtDNA, it should be considered in combination with prenatal 
screening. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Human oocytes and manipulations. The study was approved by the Newcastle 
and North Tyneside Research Ethics Committee and was licensed by the UK 
Human Fertilisation and Embryology Authority (HFEA). Informed consent was 
obtained from all donors by research nurses who were not directly involved in the 
research, or in the clinical treatments of women participating in the study. Human 
oocytes (n= 523) included in this study were donated either by women undergo- 
ing infertility treatment (n = 44 oocytes from 6 donors, age range 25-36 years) as 
part of an ‘egg sharing’ programme”, or by non-patient donors (1 = 479 oocytes 
from 57 donors, age range 21-36 years). Donors were compensated in accord- 
ance with current HFEA guidance on payments for donors”*. Non-patient donors 
received financial compensation of £500 per donation cycle. Compensation under 
the ‘egg share’ programme consisted of a subsidy (£1,500) from research funds 
towards the cost of treatment for self-funded patients”’, or an additional fully 
funded treatment cycle for those who did not become pregnant after NHS-funded 
treatment. 

Oocytes were collected by ultrasound-guided follicle aspiration and the sur- 
rounding cumulus cells were removed using hyaluronidase (HYASE; Vitrolife). 
MII oocytes were identified by the presence of the Ist polar body and were ferti- 
lized by intracytoplasmic sperm injection (ICSI) using sperm donated specifically 
for this project. The experiments were not randomized. The investigators were 
not blinded to allocation during experiments and outcome assessment, except for 
aneuploidy and gene expression analysis. 

Oocyte vitrification. MII oocytes were either vitrified or used immediately for 
PNT experiments. The majority (n = 107) of vitrified oocytes were vitrified at the 
MII stage. We also conducted a series of experiments in which vitrification was 
performed after completion of MII (at the 2PB stage; ~5.5h post-ICSI; n= 34), 
to determine whether blastocyst development might be improved. Vitrification 
and warming were performed using the RapidVit and RapidWarm oocyte kits 
(Vitrolife, Sweden). Oocytes were stored in liquid nitrogen until required. 

PNT. PNT was performed either at 16-20h after ICSI (ItPNT), or at ~8-10h after 
ICSI (ePNT). In the case of ePNT, two main series of experiments (series I and 
series II) were performed. A total of 51 zygotes from 10 donors were used for ItPNT 
(n= 12 controls; n= 39 ItPNT). For ePNT experiments, we used 58 zygotes from 
13 donors in series I (n= 19 controls; n= 39 ePNT), and 131 zygotes from 30 donors 
in series II (n= 30 controls; n= 101 ePNT). Thirty-four zygotes from 13 donors 
were used for ePNT experiments involving oocytes vitrified at the 2PB stage. 

Two types of PNT experiments were conducted: (1) autologous PNT, which 
involved removal and replacement of pronuclei in the same zygote, was performed 
to distinguish between technical and biological effects; (2) heterologous PNT 
involved reciprocal transfer between pairs of zygotes, either from the same or dif- 
ferent donors. Heterologous PNT between zygotes from different donors involved 
reciprocal transfer between zygotes originating from fresh and vitrified oocytes. 
This gave rise to reconstituted zygotes consisting of cytoplasts from fresh oocytes 
and karyoplasts from vitrified oocytes, or vice versa. These combinations are 
termed FreshCy and VitCy, respectively (see Extended Data Fig. 4a). In one set of 
ePNT experiments, which gave rise to a single ePNT blastocyst, the two donors 
were sisters and therefore have the same mitochondrial genotype. For the purpose 
of the gene expression experiments, these are referred to as homologous transfers. 

The PNT procedure was performed in an isolator-based workstation 
(Vitrosafe) with temperature, CO, and O2 control”? containing an inverted micro- 
scope (TE2000-U, Nikon) fitted with micromanipulators (Integra Ti, Research 
Instruments) and a laser objective (Saturn Active, Research Instruments). PNT 
procedures took ~15 min to complete and involved the following steps. First, 
zygotes in which 2PN were visible were placed in enucleation medium with 
cytoskeletal inhibitors. In all cases, nocodazole (10}1g ml‘) was used to depo- 
lymerize microtubules. In ltPNT experiments we used either cytochalasin B 
(5 pg ml!) or latrunculin A (2.5|1M or 51M) to disable the actin cytoskeleton. 
We subsequently used latrunculin A (2.5}1M) for all ePNT experiments. For ItPNT 
and ePNT (series I) experiments, enucleation was performed in G-1 Plus medium 
(Vitrolife). We used Sydney IVF Embryo Biopsy Medium (Cook Medical), which 
does not contain Ca?* and Mg? for ePNT (series II) experiments. Enucleation 
was performed in the presence or absence of sucrose (0.125 1m). Addition of 
sucrose increased the osmolarity of the enucleation medium from 280 mosm |"! 
to 449 mosm | ', which induced shrinkage of the cytoplasm, thereby facilitating 
enucleation. Second, a laser objective (Saturn Active, Research Instruments) was 
used to create an opening in the zona pellucida for insertion of the enucleation/ 
fusion pipette. The inner diameter pipette measurements were 25-35 1m for 
ItPNT, and 171m for ePNT. Third, the pronuclei, surrounded by a small amount 
of cytoplasm, were aspirated into the enucleation pipette, either as a single karyo- 
plast, or as two separate karyoplasts (see Supplementary Videos 1 and 2). Fourth, 
karyoplasts were briefly exposed to a suspension of HVJ-E; GenomONE-CF Ex 
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(Cosmo Bio). Undiluted suspension was used for ltPNT and ePNT (series I) and 
a 1:10 dilution was used for ePNT (series II). Fifth, the pipette containing the 
karyoplasts was inserted through the laser-drilled opening in the zona pellucida 
and karyoplasts were gently expelled into the perivitelline space and allowed to 
fuse with the cytoplast. Sixth, reconstituted and control zygotes were cultured 
either in a sequential medium, G-1 Plus (day 1-3)/G-2 Plus (day 3-6) (Vitrolife; 
ItPNT and ePNT (series I)), or in the single-step G-TL medium (Vitrolife; ePNT 
(series II)) from day 1 to 6. Time-lapse embryo imaging was performed for three 
sets of ePNT (series II) experiments using the Primo Vision Time-lapse moni- 
toring system (Vitrolife). 

Overview of experiments on PNT zygotes. Survival of reconstituted zygotes 
was initially compared between ItPNT and ePNT (series I) and was subsequently 
recorded for all ePNT zygotes. In ePNT (series II), the first mitotic division was 
monitored by time-lapse imaging in three sets of experiments. All zygotes submit- 
ted to PNT were included in the analysis of development to the blastocyst stage. 
Zygotes (controls and PNT) that developed to the blastocyst stage were graded 
and included in the analysis of blastocyst quality. Blastocyst formation and grade 
were assessed on day 6 for ItPNT, and on days 5 and 6 for ePNT, except in the 
case of two ePNT (series II) experiments, which were assessed only on day 6. 
These experiments are not included in the day 5 analysis shown in Fig. 2b and in 
Extended Data Fig. 3d-f. 

Blastocyst cell counts were performed primarily to gain insight into the causes 
of poor blastocyst quality in the ItPNT and ePNT (series I). Data on blastocyst cell 
counts were obtained from ItPNT (n= 6) and ePNT: series I (n = 8) and series II 
(n=5). Further analyses, including aneuploidy, gene expression, mtDNA car- 
ryover, and hES cell derivation were conducted on series II blastocysts only. 
Where possible, we performed multiple investigations on individual blastocysts. 
In accordance with our Local Research Ethics Committee approval and HFEA 
licence, these were performed on day 6. Unmanipulated control blastocysts and 
ePNT blastocysts were used for aneuploidy and gene expression, or aneuploidy 
and hES cell derivation. The blastocyst grades shown for each of these analyses 
refer to the grades on day 6. The numbers of blastocysts used for each set of 
experiment were: aneuploidy screening (ePNT: 1 = 30 from 20 donors; control: 
n=11 from 10 donors), gene expression analysis (ePNT: n= 11 from 10 donors; 
control: n= 3 from 3 donors), mtDNA carryover (ePNT: n= 40 from 28 donors), 
hES cell derivation (ePNT: n= 15 from 13 donors; control: n = 6 from 4 donors). 
Embryo grading. Embryos were graded using the UK National External Quality 
Assessment Service (NEQAS) grading schemes for embryos and blastocysts*°. 
Blastocysts were assigned a three-digit grade representing a score of: 1-6 for the 
extent of blastocoel expansion, 1-5 for the inner cell mass appearance and 1-3 for 
the trophectoderm appearance*'. The grade was converted to a quality category 
using the table in Extended Data Fig. 2c. 

Blastocyst nuclear counts. Day 6 blastocysts were fixed using 4% PFA at pH 
7.4. Nuclear staining was carried out using DAPI (Vectashield). Blastocysts were 
imaged using an inverted confocal microscope (Nikon A1R) with a x20 objective 
(Plan Apo, Nikon) and NIS-elements image software. Z-steps were taken at ~1 1m 
intervals and nuclear counts performed using Image] software. 

Aneuploidy screening. Clumps of cells were harvested from ePNT blastocysts 
for whole-genome amplification followed by microarray-CGH analysis according 
to a previously validated protocol using 24Sure Cytochip (Illumina). Cells were 
obtained from the trophectoderm, ICM or both. Lysis and whole-genome ampli- 
fication was performed using the SurePlex kit (Illumina) according to the manu- 
facturer’s instructions and blind to sample origin. Samples from ePNT blastocysts 
were labelled with Cy3 while a commercially available reference 46,XY DNA was 
labelled with Cy5 (Illumina)*. A laser scanner (InnoScan 710, Innopsys) was used 
to analyse the microarrays after washing and drying. The resulting images were 
analysed using BlueFuse Multi analysis software (Illumina). 

Gene expression analysis by single-cell RNA-seq. 

Blastocyst disaggregation. Blastocyst disaggregation was performed using an 
Olympus IX73 microscope and a Saturn 5 laser (Research Instruments) as 
described previously!*. Embryos were placed in drops of G-MOPS solution 
(Vitrolife) on a Petri dish overlaid with mineral oil for micromanipulation. 
The separated ICM and polar trophectoderm were washed in Ca**- and Mg** 
-free PBS (Invitrogen) and incubated in 0.05% trypsin/EDTA (Invitrogen) 
for 5-10 min. Trypsin was quenched using Global Media supplemented with 
5mgml | LifeGlobal Protein Supplement. Single cells were isolated using a 30-|um 
inner diameter blastomere biopsy pipette (Research Instruments). 

cDNA synthesis and amplification. cDNA was synthesized using SMARTer Ultra 
Low Input RNA for Illumina Sequencing-HV kit (Clontech Laboratories) accord- 
ing to the manufacturer’s guidelines and as previously published'*. cDNA was 
sheared using Covaris $2 with the modified settings 10% duty, intensity 5, burst 
cycle 200 for 2 min. Libraries were prepared using Low Input Library Prep Kit 
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(Clontech Laboratories) according to the manufacturer’s instructions. Library 
quality was assessed with an Agilent 2100 BioAnalyser and concentration meas- 
ured with a Qubit 2.0 Fluorometer (Life Technologies). Libraries were submitted 
for 50-bp paired-end sequencing using standard Illumina adapters on Illumina 
HiSeq 2500. 
RNA-seq data analysis. The quality of the RNA-seq data was evaluated using the 
FastQC tool. Samples with primer contamination and amplification bias, identi- 
fied by an unequal proportion of ATGC nucleotide percentages, were excluded 
from subsequent analysis. Reads were aligned to the human genome sequence 
hg19 using Tophat2 (ref. 33), and samples with low percentage mapping (<50%) 
were excluded from subsequent analysis. The number of reads mapping uniquely 
to each gene was counted using the program htseq-count™. The individual count 
files for each sample were normalized using both the RPKM function in the edgeR 
package*® and a variance-stabilizing transformation from the DESeq2 package**. 
A principal component analysis (PCA) of the top 12,000 most variably 
expressed genes was performed blind to sample origin on all ePNT and control 
samples to investigate differences in global gene expression. The PCA was gen- 
erated using the R package prcomp, using both the scaling and centering options. 
A subsequent PCA excluded samples below grade C, which were generally 
aneuploid for multiple chromosomes. An R script was used to perform unsu- 
pervised hierarchical clustering and to generate a heatmap using the pheatmap 
R package. An alternative approach for data dimensionality reduction was per- 
formed using the t-SNE algorithm)». The top 5 principal components of the 
VST-normalized count data were used as input for the R implementation of 
t-SNE. DESeq2 was applied to the read counts for the ePNT and control data to 
identify differential expressed genes in the primitive endoderm, trophectoderm 
and epiblast samples. 
mtDNA carryover analysis. 
mtDNA extraction and mtDNA sequencing. The control region of the mitochon- 
drial genome from oocyte donors was sequenced using ovarian follicular cells har- 
vested at the time of oocyte retrieval, or cumulus cells removed from the oocyte 
in preparation for ICSI. DNA extraction from follicular cells was performed using 
the QiAamp DNA Mini kit according to the manufacturer’s instructions (Qiagen). 
Cumulus cells were lysed for 2h in a lysis buffer (50mM Tris-HCl, pH 8.5, 1mM 
EDTA, 0.5% Tween-20 and 200j1g ml“! proteinase K) at 55°C. The enzyme was 
then inactivated by incubation at 95°C for 10 min. The control region of the 
mitochondrial genome was amplified as described previously*’ with the following 
modification: secondary PCR reactions were performed with four sets of overlap- 
ping M13-tailed primers (primer nucleotide positions, D1F: 15758-15777 and 
D1R: 019-001; D2F: 16223-16244 and D2R: 129-110; D3F: 16548-16569 and 
D3R: 389-370; D4F: 323-343 and D4R: 771-752) with an annealing temperature 
of 58°C. PCR products were purified using TSAP (Promega) then sequenced 
on an ABI3130 Genetic Analyser (Applied Biosystems) with BigDye Terminator 
cycle sequencing chemistries (v.3.1, Applied Biosystems). Sequences were directly 
compared to the revised Cambridge Reference Sequence for human mtDNA*® 
(GenBank accession number AC_000021.2) using SeqScape software (v.2.1.1, 
Applied Biosystems). 
Generation of heteroplasmic control DNA. The mtDNA control region containing 
either the wild-type or polymorphic nucleotide of interest was amplified using 
PCR primers (primer nucleotide positions: forward primer, 16016-16036; reverse 
primer, 571-552) with an annealing temperature of 58°C. PCR products ampli- 
fied from ovarian follicular cells were purified using the Agencourt AMPure XP 
purification system (Beckman Coulter) according to the manufacturer’s instruc- 
tions. PCR products amplified from cumulus cells were gel purified (QlAquick 
Gel Extraction kit, Qiagen) and cloned using the pGEM-T Easy Vector System 
(Promega) according to the manufacturer's instructions. Plasmid DNA was isolated 
using the QIAprep Spin Miniprep kit (Qiagen). Quantitative real-time PCR was 
performed using Platinum SYBR Green qPCR SuperMix-UDG (Invitrogen) and 
PCR primers (forward primer, L16016-16036; reverse primer, H16186-16167) to 
accurately determine the DNA concentration. Equimolar concentrations of DNA 
containing the wild-type or polymorphic nucleotide of interest were then com- 
bined in varying ratios to generate a range of heteroplasmic controls. 
Pyrosequencing. Quantitative pyrosequencing was used to measure mtDNA car- 
ryover in samples from ePNT blastocysts and PNT-hES cell lines. Locus-specific 
PCR and a pyrosequencing primer were designed for each polymorphic nucleotide 
of interest the mtDNA using PyroMark Assay Design Software v.2.0 (Qiagen). 
Clumps of cells from ePNT blastocysts and from PNT-hES cell lines were lysed 
for 2h ina lysis buffer (50mM Tris-HCl, pH 8.5, 1 mM EDTA, 0.5% Tween-20 and 
200,.gml! proteinase K) at 55°C. The enzyme was then inactivated by incubation 
at 95°C for 10 min. mtDNA amplification was performed before pyrosequencing 
analysis and pyrosequencing performed on the PyroMark Q24 and PyroMark 
Q96 instruments according to the manufacturer’s instructions. Quantification of 


the heteroplasmy level was achieved using the instrument software to directly 
compare the relevant peak heights of both the wild-type and polymorphic nucle- 
otides at the relevant position’. A standard curve was generated by plotting 
expected heteroplasmy level against actual heteroplasmy level for the control 
samples. The standard curve was used to determine the level of heteroplasmy in 
the blastocyst and PNT-hES cell samples. 

Mitochondrial haplogroups. Haplogroups were determined by next-genera- 
tion sequencing analysis of whole mtDNA, amplified in two overlapping 9-kb 
fragments using primers L550-569 and H9839-9819 (set 1) and L9592-9611 
and H645-626 (set 2), on an Ion Torrent Personal Genome Machine (Life 
Technologies)*’. Protocol modifications included use of a OneTouch 2 system 
and HiQ OT2 and sequencing kits. Samples were processed on Ion 316 chips and 
analysed with Torrent Suite Variant Caller plugin (v.4.6). 

ES cell derivation. For ICM isolation, day 6 blastocysts were dissected in the 
embryo culture dish. The blastocysts were held in position by a holding pipette 
(Vitrolife) with the ICM at 3 oclock. The ICM was held in position with a biopsy 
pipette (Origio, catalogue no. MPB-FP-30) and isolated from the majority of the 
trophectoderm cells by laser pulses (Saturn 5 Active, Research Instruments). 
The isolated ICMs were plated on inactivated feeders MEF (CF1 s), in medium 
containing KnockOut Serum Replacement (Invitrogen, catalogue no. 10828), 
KnockOut DMEM (Invitrogen, catalogue no. 10829), NEAA (Invitrogen, 
catalogue no. 11140), bFGF2 (Invitrogen, catalogue no. 13256), Glutamax 
(Invitrogen, catalogue no. 35050) and 2-mercaptoethanol (Invitrogen, catalogue 
no. 21985). Outgrowths were mechanically dissected from the surrounding cells 
and plated on fresh feeder cells, all subsequent passaging was performed mechan- 
ically. Embryonic stem cells were adapted to feeder-free culture conditions and 
maintained in mTeSR1 (STEMCELL Technologies) on growth-factor-reduced 
Matrigel (BD Biosciences). The hES cell lines generated during the course of this 
study will be depositied with the UK Stem Cell Bank (UKSCB). All the necessary 
tests, including mycoplasma testing will be performed by the UKSCB prior to 
distribution of the cell lines. 

Quantitative RT-PCR. RNA was isolated using TRI Reagent (Sigma) and DNasel 
treated (Ambion). cDNA was synthesized using a Maxima First Strand cDNA 
Synthesis Kit (Fermentas). qRT-PCR was performed using Quantace Sensimix on 
an Applied Biosystems 7500 machine (Life Technologies). Primer pairs used were 
previously published“ and are as follows: NANOG forward GCAACCTGAAGAC 
GTGTGAA, reverse CTCGCTGAT TAGGCTCCAAC; POUSF1 forward TATGGG 
AGCCCTCACTTCAG, reverse CAAAAACCCTGGCACAAACT; SOX2 forward 
TTGTTCGATCCCAACTTTCCG, reverse ACATGGATTCTCGGCAGACT. 
Immunohistochemistry and imaging. Samples were fixed in 4% paraformalde- 
hyde at 4°C overnight, permeabilized with 0.5% Tween in 1x PBS for 20 min and 
blocked with 10% FBS diluted in 0.1% Tween in 1 x PBS for 1h. Primary antibodies 
were diluted in blocking solution as indicated: AFP (Dako, A0008, 1:500), desmin 
(Neomarkers/Thermo, RB-9014-P, 1:50), NANOG (R&D, AF1977, 1:500), OCT4 
(Santa Cruz, SC-5279, 1:500), SMA (Sigma, A5228, 1:250), SOX1 (R&D, AF3369, 
1:500), SOX2 (Cell Signaling, 23064, 1:500), SOX17 (R&D, AF1924, 1:500), SSEA4 
(DSHB, MC-813-70, 1:100), TUJ1 (Sigma, T2200, 1:500). Samples were incubated 
at 4°C rotating overnight. Alexa Fluor secondary antibodies (Invitrogen, anti- 
mouse A21202, A21203; anti-rabbit A21206, A21207; anti-goat A11055, A11058) 
were diluted 1:300 in blocking solution and samples incubated for 1h at room tem- 
perature, then washed and covered with 0.1% Tween in 1x PBS containing DAPI 
Vectashield mounting medium (Vector Labs). Images were taken on an Olympus 
1X73 microscope with Cell4F software (Olympus Corporation). 

Statistical analysis. Data were analysed using one-way ANOVA with Tukey’s 
HSD test, Mann-Whitney U-test, \” test and Fisher’s exact test, as indicated in 
the figure legends. RNA-seq data were analysed by PCA using either RPKM- or 
VST-normalized counts. 


27. Choudhary, M. et a/. Egg sharing for research: a successful outcome for 
patients and researchers. Cell Stem Cell 10, 239-240 (2012). 

28. HFEA Guidance on Payments for Donors. HFEA Code of Practice Section 13 
http://www.hfea.gov.uk/500.htm! (Human Fertilisation and Embryology 
Authority, 2009). 

29. Hyslop, L. et al. A novel isolator-based system promotes viability of human 
embryos during laboratory processing. PLoS ONE 7, e31010 (2012). 

30. Cutting, R., Morroll, D., Roberts, S. A., Pickering, S. & Rutherford, A. Elective 
single embryo transfer: guidelines for practice British Fertility Society and 
Association of Clinical Embryologists. Hum. Fertil. (Camb.) 11, 131-146 (2008). 

31. Stephenson, E. L., Braude, P. R. & Mason, C. International community 
consensus standard for reporting derivation of human embryonic stem cell 
lines. Regen. Med. 2, 349-362 (2007). 

32. Wells, D. et a/. Clinical utilisation of a rapid low-pass whole genome sequencing 
technique for the diagnosis of aneuploidy in human embryos prior to 
implantation. J. Med. Genet. 51, 553-562 (2014). 


© 2016 Macmillan Publishers Limited. All rights reserved 


33. 


34. 
35. 


36. 


37. 


Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the 
presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 
(2013). 

Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with 
high-throughput sequencing data. Bioinformatics 31, 166-169 (2015). 
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package 
for differential expression analysis of digital gene expression data. 
Bioinformatics 26, 139-140 (2010). 

Love, M. |., Huber, W. & Anders, S. Moderated estimation of fold change 

and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 
(2014). 

Taylor, R. W., Taylor, G. A., Durham, S. E. & Turnbull, D. M. The determination of 
complete human mitochondrial DNA sequences in single cells: implications 
for the study of somatic mitochondrial DNA point mutations. Nucleic Acids Res. 
29, e74 (2001). 


38. 


39. 


40. 


4l. 


42. 


LETTER 


Andrews, R. M. et al. Reanalysis and revision of the Cambridge 

reference sequence for human mitochondrial DNA. Nature Genet. 23, 147 
(1999). 

Greaves, L. C. et a/. Clonal expansion of early to mid-life mitochondrial DNA 
point mutations drives mitochondrial dysfunction during human ageing. PLoS 
Genet. 10, e1004620 (2014). 

Wamaitha, S. E. et al. Gata6 potently initiates reprograming of pluripotent and 
differentiated cells to extraembryonic endoderm stem cells. Genes Dev. 29, 
1239-1255 (2015). 

Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and 
quantifying mammalian transcriptomes by RNA-Seq. Nature Methods 5, 
621-628 (2008). 

White, H. E. et al. Accurate detection and quantitation of heteroplasmic 
mitochondrial point mutations by pyrosequencing. Genet. Test. 9, 190-199 
(2005). 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


a Completion of meiosis 


Metaphase II Anaphase II 


1st polar | 
ZO. 


1st mitotic cell cycle 
G, phase S phase 


G, phase 


w% 


ee Earl a 
— pronulel tal 
ePNT 


Sperm injected 
(ICSI) 


Unmanipulated zygote 


ePNT zygote 


Extended Data Figure 1 | ltPNT and pronuclei centralization after 
ePNT. a, Schematic showing timing of ltPNT and ePNT. b, Images 
showing the stages of the ItPNT process: left, late pronuclear zygote; 
middle, enucleation; right, fusion. Scale bar, 201m. Note large pronuclei 
and pipette size compared with Fig. 1b and Supplementary Videos 1 and 2. 
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c, Images show unmanipulated and ePNT zygotes at 16-18 h after 
fertilization. Scale bars, 201m. d, Pronuclei centralization and division 
to the two-cell stage assessed by live cell imaging in control and ePNT 
zygotes (not significant). Comparisons by x” test. 
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Extended Data Figure 2 | Blastocyst morphology and effect of PNT on 
blastocyst development and quality. a, Schematic showing cell lineages 

in a mammalian blastocyst: trophectoderm; primitive endoderm and 
epiblast. b, Morphological criteria and scoring system used for grading 
human blastocysts*’. Top, degree of expansion ranging from an early, 
unexpanded blastocyst (score 1) to fully expanded (score 6). Middle, range 
of ICM morphologies from absent (score 1) to large but tightly packed 
(score 5). Bottom, range of trophectoderm morphologies from scant and 
discontinuous (score 1) to a fully formed layer of tightly packed cells (score 
3). Box colours correspond to the grades shown in c. c, Table used to assign 
blastocyst grades, according to levels of expansion, and morphology of the 
ICM and trophectoderm. Grade F (not shown) was assigned to embryos 
that developed to the blastocyst stage but subsequently showed signs of 
degeneration. d, Graph showing the relationship between blastocyst grade 
and implantation. Data obtained from clinical IVF cycles (n= 531) in 
which unmanipulated single blastocysts were replaced on day 5. 
Implantation is defined by the detection of a fetal heartbeat at 6 weeks 

after IVF treatment. There was no case in which a grade D or F blastocyst 


was replaced. P values are shown (y’ test). e, ItPNT experimental 
conditions, blastocyst formation (P< 0.01; \ test) and quality. A total 

of 51 zygotes from 10 donors were allocated either to an unmanipulated 
control group (Ctr.; n = 12) or to ItPNT involving transfer between pairs of 
zygotes from the same donor (n= 29) or replacement back into the same 
zygote (autologous PNT (Atlg.) n= 10). f, ePNT (series I) experimental 
conditions, blastocyst formation and quality. This series of experiments 
involved a total of 58 zygotes from 13 donors. Zygotes were allocated to a 
control group (n= 19), or to ePNT involving either autologous (m= 18) or 
heterologous (Het.; n = 21) transfers. Differences are not significant 

(x? test and Fisher’s exact test). g, Image of an ePNT blastocyst fixed 

on day 6 and stained with 4’,6-diamidino-2-phenylindole (DAPI). 

Scale bar, 501m. h, Cell number assessed by nuclear counts showing 
comparable numbers in control and ePNT (series II) blastocysts and 
significantly reduced numbers in ItPNT and ePNT (series I) blastocysts 
(P= 0.001; one-way analysis of variance (ANOVA) with Tukey’s HSD test). 
Mean + standard deviation (s.d.) calculated from individual blastocysts, 
numbers indicated on the x-axis. 
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Extended Data Figure 3 | Survival and blastocyst development after 
ePNT between zygotes obtained from freshly harvested and vitrified 
oocytes. a, Experimental scheme for heterologous ePNT in series II. 
Because of unpredictability in the response to ovarian stimulation, 
heterologous transfers involved reciprocal ePNT between zygotes 
generated from freshly harvested and vitrified oocytes. This resulted 

in reconstituted zygotes whose cytoplasts originated from a fresh 
oocyte (FreshCy), or from a vitrified oocyte (VitCy). Oocytes for these 
experiments were vitrified predominantly at the MII stage (blue box; 
n= 80 zygotes; 25 donors). We also conducted a series of experiments 
to determine whether vitrification at the 2PB stage (green box; n = 34 
zygotes; 13 donors) would promote improved blastocyst formation. 

b, Survival of reconstituted zygotes as a proportion of those submitted 
to autologous (Atlg.) and heterologous (Het.) ePNT according to the 
stage of vitrification (MII or 2PB) and according to whether the cytoplast 
was derived from a fresh (FreshCy) or a vitrified (VitCy) oocyte. Loss 


Day 6 


was generally due to karyoplast lysis, excessive leakage of cytoplasm, or 
degeneration of reconstituted zygotes during subsequent incubation. 
Differences are not significant (x test). c, Sucrose was initially included 
in the manipulation medium to facilitate enucleation and fusion, however, 
it was later removed because the data indicated that the osmotic effect 
resulted in increased mtDNA carryover (see Fig. 4b). Omission of sucrose 
from the enucleation medium had a small, but not significant, effect on 
survival of zygotes whose cytoplasts originated from vitrified MII oocytes 
(x? test). d, Blastocyst formation as a percentage of zygotes submitted 

to the ePNT procedure recorded on days 5 and 6 after fertilization. 

e, Blastocyst formation recorded on days 5 and 6 as a percentage of zygotes 
that survived the ePNT procedure. The numbers in each group and P 
values are shown, x” test. f, Blastocyst quality grades (see Extended Data 
Fig. 2c, d) on days 5 and 6 (not significant; Fisher's exact test). Source data 
files are available online. 
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Extended Data Figure 4 | Array-CGH results for PNT blastocysts. Summary of array-CGH results obtained from ICM and trophectoderm samples 
from control (n= 11) and ePNT (n= 30) blastocysts. Blastocysts are ordered by grade within experimental groups. The karyoplast donor age is also 
shown. 
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Extended Data Figure 5 | Experimental approach and bioinformatics 
analysis of single-cell RNA-seq data from ePNT and control blastocysts. 
a, Flow diagram showing the steps involved in RNA-seq of single cells 
microdissected from human blastocysts. b, Summary table of control 

and ePNT blastocysts submitted for RNA-seq analysis. For the purpose 

of gene expression analysis, we distinguish between ePNT blastocysts 
derived from fusion of cytoplasts and karyoplasts with the same, and 
different, mitochondrial genomes. Those with the same mitochondrial 
genomes included blastocysts from autologous ePNT and from a zygote 
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pair donated by two sisters, which we refer to as homologous ePNT. 
Blastocysts arising from heterologous ePNT represent new combinations 
of nuclear and mitochondrial genome and are subgrouped according to 
the cytoplast origin (see Extended Data Fig. 4). c, Flow diagram outlining 
the bioinformatics analysis of RNA-seq data. Data were normalized 

either as reads per kilobase per million mapped reads (RPKM)"’ or using 
DESeq2 (ref. 36). Normalized data were used to generate PCA plots, t-SNE 
plots and heatmaps. 
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Extended Data Figure 6 | Analysis of differential gene expression in 
good quality ePNT and control blastocysts. a, PCA matrix using the first 
ten principal components of DESeq2 VST normalized data for the top 
12,000 most variable genes. Global gene expression is indistinguishable 
between unmanipulated control and ePNT samples, PC1 versus PC2 
highlighted in blue box. b, PCA matrix as shown in a, distinguished by 
lineage, clearly seen in PC2 versus PC3 (pink box). c, t-SNE analysis 

after DESeq2 VST normalization of 6,000 of the most variably expressed 
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genes, where samples were distinguished by lineage. Sample numbers and 
blastocyst grades are shown. Autologous and homologous ePNT samples 
are derived from blastocysts in which the karyoplast and cytoplast had the 
same mitochondrial genome. Heterologous ePNT samples were derived 
from pairs of zygotes with different mitochondrial genomes (Extended 
Data Fig. 5). Samples from experimental controls and reference population 
were combined for the purpose of the analyses shown in a and b. 
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Extended Data Figure 7 | Expression of lineage-specific genes and 
mitochondrial OXPHOS genes in control and ePNT embryos. 

a, Heatmap showing log)-transformed RPKM values of selected 
differentially expressed genes in trophectoderm (n= 10), epiblast (n = 10) 
and primitive endoderm (n = 10) lineages. b, Heatmap showing expression 
of mitochondrial OXPHOS genes after unsupervised hierarchical 
clustering. Expression of OXPHOS genes encoded by mtDNA is variable 
both within and between blastocysts. Control and ePNT samples cluster 
together, irrespective of whether the cytoplast and karyoplast had the 
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same (blue font) or different (purple font) mtDNA. Sample numbers 

and blastocyst grades are shown. The reference population includes a 
previously published series'*. Autologous and homologous ePNT samples 
are derived from blastocysts in which the karyoplast and cytoplast had the 
same mitochondrial genome. Heterologous ePNT samples were derived 
from pairs of zygotes with different mitochondrial genomes (Extended 
Data Fig. 5b). Expression levels are indicated on a high-to-low scale 
(purple-white-green). Source data files are available online for a and b. 
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Extended Data Figure 8 | Measurement of heteroplasmy due to mtDNA 
carryover during ePNT. a, Mitochondrial genotypes were determined 
by identifying polymorphic variants in the hypervariable mtDNA control 
regions of each donor. Sequence electropherograms of mtDNA non- 
coding control region with a sequence variant used for pyrosequencing 
(highlighted) (m.73A>G). b, Sequence pyrograms for the mtDNA 
variant (m.73A>G) in control samples. The expected level of variant is 
given along with the level determined by pyrosequencing (in brackets). 

c, Examples of the standard curve generated to increase accuracy in 
detecting low levels (0-25%) of heteroplasmy by pyrosequencing, which 
has previously been reported to accurately detect heteroplasmy at a level 
of 1% (ref. 42). Each data point represents the mean of 3-4 technical 
replicates. d, mtDNA carryover was measured by pyrosequencing using 
clumps of cells (n = 92) from day 6 blastocysts (n = 40; names shown on 
y axis). The cells were predominantly obtained from the trophectoderm 


10123 45 67 89 101112 
mtDNA carryover (%) 
(TE) layer (purple, n = 67). ICM cells (red, n = 5) and cells of mixed 
trophectoderm/ICM origin (green, n = 20) were also analysed. Each data 
point represents the mean of 2-3 technical replicates. e, mtDNA carryover 
from individual blastocysts calculated from data shown in d. Each data 
point represents either the mean value where more than one sample was 
tested (n = 28 ePNT blastocysts), or a single value where only one sample 
was tested (n = 12 ePNT blastocysts). Horizontal lines show median values 
for each experimental group. Blastocysts arising from ePNT performed 
in the absence of sucrose and fused with a fresh cytoplast (FreshCy) had 
significantly reduced mtDNA carryover compared with blastocysts where 
ePNT was performed in the presence of sucrose (P values and blastocyst 
numbers are shown; two-sided Mann-Whitney U-test). f, Graph 
showing the proportions of blastocysts (n = 40) with mtDNA carryover 
measurements falling within the specified levels (not significant: y” test). 
Source data files are provided for c-f. 
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Extended Data Figure 9 | Derivation and characterization of human 

ES cells from control and ePNT blastocysts. a, Examples of outgrowths 
formed following explantation of the ICM from ePNT (n= 15) and control 
(n=6) blastocysts used for hES cell derivations. The dashed white circle 
indicates the region picked for initial passage of the ICM outgrowth. 
Bottom, examples of hES cell colonies. b, Example of a normal karyogram 
from an PNT-hES cell line (45PNT); 4/4 lines tested showed a normal 
karyotype. The remaining hES cell line did not grow beyond passage 2 

and was derived from a uniformly aneuploid blastocyst (55PNT; Extended 
Data Fig. 4). c, Immunostaining of representative PNT-hES cells (grown in 
mTeSR1) for NANOG, SSEA4 (green), SOX2 and OCT4 (red) with DAPI 
(blue) merge. Graph shows quantitative polymerase chain reaction with 
reverse transcription (qRT-PCR) analysis of control and PNT-hES cell 
lines for pluripotency transcripts NANOG, POUS5F1 and SOX2. Horizontal 


6.44+2.8 Undetectable 


line shows the median value, which is similar between hES cells from 
unmanipulated control blastocysts (Ctr.; n =2 hES cell lines) and 
ePNT-hES cells (n =4 ePNT-hES cell lines). d, Immunostaining 

of representative PNT-hES cells after 20 days in basal MEF media, 
confirming differentiation into all three germ layers: endoderm 
(a-fetoprotein (AFP); SOX17), mesoderm (a-smooth muscle actin (SMA); 
desmin (DES)) and ectoderm (8-III tubulin (TUJ1); SOX1) in green or 
red, with DAPI (blue) merge. Scale bars, 501m. e, Table showing the 
mtDNA variants and primers used to measure mtDNA carryover in 
PNT-hES cell lines. f, Summary table showing details of blastocysts 
and the corresponding hES cells. Aneuploidy in PNT-hES cell lines 
was analysed by metaphase spreads, except for 31PNT-hES, which was 
determined by array-CGH. 
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Extended Data Figure 10 | Heteroplasmy in subclones of the hES 
cell line derived from 36PNT. a, The 36PNT hES cell line was frozen 
at passage 3 (after derivation), thawed and subcloned to monitor 
heteroplasmy arising from the karyoplast donor mtDNA. Six colonies 
(15-20) were randomly selected at the first post-thaw passage (P3) and 
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clumps of cells were plated on 3 tissue culture wells; 5/6 colonies gave rise 
to 3 subclones, which were grown to P11. Subclones are distinguished by 
colour in the graphs. Each data point represents the mean of two technical 
replicates for a single cell sample. Source data file is available online. 
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Co-repressor CBFA2T2 regulates pluripotency and 


germline development 


Shengjiang Tu!’, Varun Narendra! 


, Masashi Yamaji*, Simon E. Vidal’, Luis Alejandro Rojas! 


, Xiaoshi Wang”, Sang Yong Kim‘°, 


Benjamin A. Garcia®, Thomas Tuschl’, Matthias Stadtfeld* & Danny Reinberg!? 


Developmental specification of germ cells lies at the heart of 
inheritance, as germ cells contain all of the genetic and epigenetic 
information transmitted between generations. The critical 
developmental event distinguishing germline from somatic 
lineages is the differentiation of primordial germ cells (PGCs)'”, 
precursors of sex-specific gametes that produce an entire organism 
upon fertilization. Germ cells toggle between uni- and pluripotent 
states as they exhibit their own ‘latent’ form of pluripotency. For 
example, PGCs express a number of transcription factors in 
common with embryonic stem (ES) cells, including OCT4 (encoded 
by Pou5f1), SOX2, NANOG and PRDM14 (refs 2-4). A biochemical 
mechanism by which these transcription factors converge on 
chromatin to produce the dramatic rearrangements underlying 
ES-cell- and PGC-specific transcriptional programs remains 
poorly understood. Here we identify a novel co-repressor protein, 
CBFA2T2, that regulates pluripotency and germline specification 
in mice. Cbfa2t2~’~ mice display severe defects in PGC maturation 
and epigenetic reprogramming. CBFA2T2 forms a biochemical 
complex with PRDM14, a germline-specific transcription factor. 
Mechanistically, CBFA2T2 oligomerizes to form a scaffold upon 
which PRDM14 and OCT4 are stabilized on chromatin. Thus, 
in contrast to the traditional ‘passenger’ role of a co-repressor, 
CBFA2T2 functions synergistically with transcription factors at the 
crossroads of the fundamental developmental plasticity between 
uni- and pluripotency. 

The germ line first segregates from somatic lineages via the spec- 
ification of PGCs between embiyeeic day (E)6.25-7.25 in mice”. 
PRDM14 regulates pluripotency*’, and is the only known transcrip- 
tion factor to specifically regulate germ cell specification*. To under- 
stand better the mechanism(s) underlying PGC development, we 
sought PRDM14-interacting proteins in the human germ-cell tumour 
cell line NCCIT. NCCIT cells stably expressing Flag-PRDM14 were 
subjected to affinity purification and proteomic analysis. In contrast 
with previous reports’~’, neither EZH2 nor other polycomb repressive 
complex 2 (PRC2) components co-purified with PRDM14. Instead, 
the strongest identified interaction involved a co-repressor protein, 
CBFA2T2 (Extended Data Fig. 1a). Reciprocal affinity purification of 
Flag-haemagglutinin (HA)-tagged CBFA2T2 confirmed strong inter- 
action with PRDM14 (Extended Data Fig. 1a). CBFA2T2, CBFA2T3 
and ETO (also known as RUNX1T1) comprise a homologous gene 
family frequently targeted for translocation events in acute myeloid 
leukaemia'”!°. Despite 85% sequence similarity among homologues, 
their ubiquitous expression and capacity to form heterotetramers!*'*, 
ETO and CBFA2T3 were barely detectable (Extended Data Fig. 1a). 
This specificity for CBFA2T2 aligns with published microarray data 
indicating that it is the only family member upregulated during induced 
pluripotent stem (iPS) cell reprogramming!’ and PGC specification”. 


Further reciprocal immunoprecipitations confirmed endogenous 
PRDM14 and CBFA2T2 interaction in both NCCIT cells and mouse 
ES cells (Fig. 1a and Extended Data Fig. 1b, c). Gel filtration of the 
Flag eluate gave evidence of a larger than 600 kDa complex (Extended 
Data Fig. 1d), possibly due to CBFA2T2 oligomerization“. Moreover, a 
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Figure 1 | PRDM14 and the co-repressor protein CBFA2T2 interact 
and bind to chromatin interdependently. a, Immunoprecipitation (IP) 
using antibodies against the indicated endogenous proteins in NCCIT 
cells. For all western blots, source gel data are included in Supplementary 
Fig. 1. b, Venn diagram depicting the overlap of PRDM14 and CBFA2T2 
target genes as identified by ChIP-seq. c, Genome browser tracks showing 
PRDM14 and CBFA2T2 at their respective genomic loci. d, e, ChIP-qPCR 
at SERs found near the 11 indicated genes in NCCIT cells with siRNAs 
against CBFA2T2 (e) or shRNAs against PRDM14 (d) (n =3 biological 
replicates). KD, knockdown. Error bars show standard deviation (s.d.). 
qPCR source data are included in the Supplementary Information. 
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GAL4 recruitment assay demonstrated that GAL4-PRDM14 recruited 
CBFA2T2, but not EZH2, to the chromatinized luciferase promoter 
(Extended Data Fig. le). 

To ascertain PRDM14 and CBFA2T2 colocalization on chromatin 
genome wide, we performed chromatin immunoprecipitation followed 
by sequencing (ChIP-seq) in wild-type NCCIT cells. In the case of 
CBFA2T2, 2,077 statistically enriched regions (SERs) were identified 
using a stringent P-value threshold of 1 x 10~'°, of which 1,384 over- 
lapped with a PRDM14-binding event (Extended Data Fig. 2a and 
Supplementary Table 1). Global mapping of SERs to their nearest pro- 
moters identified 1,022 PRDM14/CBFA2T2 co-targeted genes (Fig. 1b), 
many of which are transcription factors involved in lineage commit- 
ment (Extended Data Fig. 2b, c). By contrast, PRDM14 exhibited very 
limited overlap with PRC2 or Polycomb repressive complex 1 (PRC1) 
(Extended Data Fig. 2a). Interestingly, PRDM14 and CBFA2T2 co-bind 
the genomic loci from which they are transcribed (Fig. 1c). 

The sequence-specific DNA-binding capacity of PRDM14 coupled 
to the co-repressor activity of CBFA2T2 suggested a hierarchical 
model of chromatin recruitment. We performed knockdowns of 
PRDM14 or CBFA2T2 using short hairpin RNAs (shRNAs) or short 
interfering RNAs (siRNAs). As expected, PRDM14 knockdown resulted 
in a loss of CBFA2T2 localization at 11/11 common target genes 
(Fig. 1d). Surprisingly, CBFA2T2 knockdown caused a reciprocal loss 
of PRDM14 binding to the same genes (Fig. 1e), with minimal effect 
on PRDM14 expression (Extended Data Fig. 2d). Thus, PRDM14 
localization to chromatin depends on its DNA-binding activity and its 
association with CBFA2T2. 

PRDM14 is required to repress lineage commitment genes and ensures 
naive pluripotency in mouse embryonic stem (mES) cells®’. To examine 
such a role for CBFA2T2, we generated Cbfa2t2- and Prdm14-knockout 
cells in KH2 mES cells’? using CRISPR-Cas9 genome editing”®. 
Guide RNAs (gRNAs) targeting the sixth exon (common to all Cbfa2t2 
isoforms) or the second exon of Prdm14, produced multiple lines 
harbouring distinct frameshift mutations and loss of the targeted 
protein (Fig. 2a and Extended Data Fig. 3a, b). Colonies of Prdm14- 
and Cbfa2t2-knockout mES cells displayed a flattened morphology 
(Extended Data Fig. 3c). Both mutant lines ceased to grow and could 
not be maintained in the absence of kinase inhibitors of MAPK/ERK 
and GSK3 (2i)?! (Extended Data Fig. 3d), as shown in the case of 
Prdm14-knockout lines’. After exposure to 2i-free conditions, three dif- 
ferent knockout lines for both Prdm14 and Cbfa2t2, alongside wild-type 
cells, were subjected to RNA sequencing (RNA-seq) analyses. Eighty- 
five per cent of genes differentially expressed in a Prdm14-knockout 
setting were also dysregulated upon loss of Cbfa2t2 expression 
(Fig. 2b, Extended Data Fig. 3e and Supplementary Table 2). Moreover, 
the directionality of differential gene expression was nearly identical 
across mutants (Fig. 2c and Extended Data Fig. 3f). In both knockout 
ES cells, numerous pluripotency genes, including KIf4, Pou5f1, Nr0b1 
(also known as Dax1), Lin28a and Myc, were downregulated, whereas 
lineage commitment genes such as E/f3, Cdx1 and Pitx2 were upreg- 
ulated. Similar to the case with PRDM14 (ref. 5), CBFA2T2 overex- 
pression enhanced iPS cell reprogramming efficiency (Extended Data 
Fig. 3g, h). Thus, the CBFA2T2 co-repressor contributes positively to 
pluripotency. 

Given that Prdm14~’~ mice displayed a major defect in germline 
development’, we tested the contribution of CBFA2T2 to both somatic 
and germline development by generating Cbfa2t2-knockout mice via 
CRISPR zygotic injection”. C57BL/6 zygotes were co-injected with 
Cas9 messenger RNA and one of the gRNAs used in mES cells to tar- 
get exon 6 of Cbfa2t2 (Fig. 3a). We obtained multiple pups possessing 
a genetic lesion that caused a frameshift mutation and a dysfunc- 
tional truncated protein (Extended Data Fig. 4a). Genetic targeting 
was specific, as the ten most likely off-target genomic regions were 
unperturbed (Supplementary Table 3). Intercrossing of Cbfa2t2*/— 
mice produced pups in a roughly normal Mendelian ratio (25% +/+ 
(30); 58% +/— (70); 17% —/— (21)) and Cbfa2t2~/~ animals appeared 
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Figure 2 | PRDM14 and CBFA2T2 regulate pluripotency. a, Western 
blots confirming loss of PRDM14 or CBFA2T2 expression in knockout 
(KO) mES cells. Nonspecific bands are denoted with an asterisk. WT, 
wild type. P-KO and C-KO series represent three independently derived 
Prdm14- or Cbfa2t2-knockout lines, respectively. b, Venn diagram 
depicting the overlap of differentially expressed genes upon deletion of 
Prdm14 or Cbfa2t2 after removal of feeders from fetal bovine serum (FBS) 
plus leukaemia inhibitory factor (LIF) plus feeders culture. c, Heat map 
showing relative expression of all differentially expressed genes identified 
with a false discovery rate (FDR) less than 1 x 10“ between wild-type and 
either Prdm14- or Cbfa2t2-knockout mES cells. 


normal. However, crosses of those mice (two female and three male) 
with wild-type C57BL/6 counterparts failed to produce pups over 
2 months. 

To pinpoint the germline defect underlying the fertility phenotype 
of Cbfa2t2~/~ mice, we analysed anatomical and histological pheno- 
types of the reproductive organs. Female Cbfa2t2~/~ adult mice have 
underdeveloped ovaries (Fig. 3b), exhibiting a total absence of follicles 
with haematoxylin and eosin (H&E) stainings (Fig. 3c). Similarly, testes 
of male Cbfa2t2~/~ mice were reduced to ~30% of wild type (Fig. 3d 
and Extended Data Fig. 4b). Total number of sperm was reduced to 
less than 10% of wild type, while remaining sperm were largely immo- 
tile (Extended Data Fig. 4c) and unable to bind the zonae pellucidae 
of oocytes during in vitro fertilization. H&E staining of sections of 
Cbfa2t2~/~ testes showed that 41% of seminiferous tubules did not 
contain spermatogenic cells (Fig. 3e). Furthermore, postnatal day 0 (PO) 
male Cbfa2t2~’~ testes were almost completely devoid of gonocytes 
(Extended Data Fig. 4d). These data contrast with a previous study 
claiming that Cbfa2t2~/~ mice are fertile”. This discrepancy may be 
due to differing purity of the genetic background. 

To understand the germline phenotype observed in both sexes, we 
examined PGC development in Cbfa2t2~'~ embryos. Alkaline phos- 
phatase staining of the genital ridge of E11.5 Cbfa2t2-/~ embryos 
showed greater than 95% reduction in the number of PGCs relative to 
wild type (Fig. 3f). This defect occurs even earlier, at E7.25-8.75 (Fig. 3g 
and Extended Data Fig. 4e, f). In accordance, SOX2 is not activated in 
the mutants (Extended Data Fig. 4g). Thus, CBFA2T2 is a novel factor 
required for specification and development of PGCs, and the defect in 
this process results in germ cell depletion. 

We next mapped the genomic localizations of PRDM14 and 
CBFA2T2 in mES cells, relative to that of OCT4, SOX2 and NANOG 
(OSN) from published ChIP-seq data**, CBFA2T2 and PRDM14 colo- 
calize broadly across the genome in mES cells (Fig. 4a), and also exhibit 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


a, Chfa2t2 3 45 6 78910111213 b c 
eemndunwnneennda ReSee ARTE SEPM Rion 5 ol) o 
AGTTCCTCCT GAACCGCCTGCCAAGAGAGT GTGTACTATC” = 
20 nt target 3 
2 
QRNA +, ¢ —> £ 
casa AY ig 
asd ANS Embryos transfer to 
Zygote pseudopregnant 
female 
* 
E8.75(9Somites) , d e 
oO 
OQ 8 
ce 3 
Be @ 
7 o 
a 
5 = 
4 0 
+/+ +/- -/- 
Chfa2t2 


Figure 3 | Cbfa2t2—/~ mice are defective in their germ line. a, Schematic 
of Cbfa2t2-knockout mouse generation by CRISPR zygotic injection. 

nt, nucleotide. b, d, Image of dissected ovaries (n =8) and testes (n = 4), 
respectively, in Cbfa2t2-knockout mice. Scale bars, 1 mm. ¢, e, Histological 
sections of ovaries and testes, respectively, stained by H&E. Scale bars, 
100m. f, Genital ridges of Cbfa2t2’’* and Cbfa2t2-- embryos at E11.5 


considerable overlap with OSN, as reported for PRDM14 in human ES 
cells’. As in the case of NCCIT cells, CBFA2T2-PRDM 14 target genes 
include numerous lineage-commitment transcription factors and chro- 
matin regulators (Extended Data Fig. 5a and Supplementary Table 4), 
many of which are co-occupied by OCT4, including the histone H3K9 


stained by alkaline phosphatase. Scale bar, 1 mm. g, Alkaline phosphatase 
staining of PGCs of E8.75 (9 somites) embryos is shown (n= 3). 
Arrowheads point to the boundary of the developing hindgut. pm, 
para-axial mesoderm. Scale bar, 100 ,1m. PGC numbers in each embryo 
were plotted in the right panel, with the following values: +/+, 93 + 5; 
+/—, 87 £5; —/—, 38 +1. 


The nervy homology 2 domain (NHR2) of ETO—a CBFA2T2 
homologue—is required for self-renewal of haematopoietic stem 
cells in leukaemia!*. NHR2 functions in homo- and heterotypic oli- 
gomerization by forming a four-helix bundle tetrameric structure. 
A seven amino acid ‘m7’ substitution within NHR2 disrupted oligomer- 
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Figure 4 | Mechanism of CBFA2T2-PRDM14 complex chromatin 
binding and direct regulation of PGC epigenetic reprogramming. 

a, Heat map depicting CBFA2T2, PRDM14, OCT4, SOX2 and NANOG 
ChIP-seq read density centred about the top 299 CBFA2T2 SERs in mES 
cells. b, Representative genome browser tracks at the indicated Ehmt1 
locus in mES cells. c, Domain annotation of wild-type, Cbfa2t2-knockout 
(CBF-KO) and oligomerization-mutant m7 proteins. The 7 amino acids 
mutated in Cbfa2t2-m7 (CBF-m7) are depicted as lines within NHR2. 

d, Immunoprecipitation (IP) against the indicated proteins in Cbfa2t2-m7 
mES cells followed by western blot. e-~g, ChIP-qPCR using antibodies 
directed against CBFA2T2 (e), PRDM14 (f) or OCT4 (g) at SERs found 
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Deters Gam) 


near the indicated genes (n = 3). Error bars, s.d. qPCR source data are 
included in the Supplementary Information. h, EHMT1 expression (red) in 
AP-2+-positive PGCs (green, arrowheads) in Cbfa2t2*/~ and Cbfa2t2~'~ 
embryos at E8.0, late head-fold (LHF) stage. i, Immunofluorescence 
analysis of H3K9me2 (red) of AP-2\-positive (green; arrowheads) PGCs 
in Cbfa2t2*/~ and Cbfa2t2~'~ embryos at E8.75. Line plot analysis on 
yellow-arrowed area are shown on the right. Scale bars, 101m. Data are 
representative of three independent expeiments. j, Model depicting the 
co-repressor CBFA2T2 oligomerizing to stabilize associated transcription 
factors (PRDM14 and OCT4) on chromatin. 
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to ES cell pluripotency, mES cells harbouring the m7 mutation in 
CBFA2T2 were generated using CRISPR-Cas9 technology (Fig. 4c 
and Extended Data Fig. 5b). Similar to Cbfa2t2~’~ cells, Cbfa2t2-m7 
cells exhibited a flattened morphology (Extended Data Fig. 5c) anda 
total abrogation of CBFA2T2 occupancy at a number of target genes 
(Fig. 4e). Furthermore, while PRDM14 and OCT4 protein levels were 
unperturbed, as was biochemical interaction with PRDM 14 (Extended 
Data Fig. 5d and Fig. 4d, respectively), CBFA2T2 oligomerization was 
required to stabilize PRDM14 and OCT4 on chromatin. ChIP with 
quantitative polymerase chain reaction (ChIP-qPCR) showed a signif- 
icant reduction in PRDM14 and OCT4 occupancy across 12/12 target 
genes tested (Fig. 4f, g). Importantly, PRDM14—CBFA2T2-independent 
OCT4 targets retained OCT4 binding (Extended Data Fig. 5e). Thus, 
CBFA2T2 oligomerization is a critical molecular event underpinning 
a pluripotent network, providing a scaffolding function to stabilize 
essential transcription factors such as PRDM14 and OCT4 at their 
target sites. 

CBFA2T2-PRDM14 targets comprise numerous components of 
the chromatin modifying machinery, such as EHMT1 (also known 
as GLP) (Fig. 4b, Extended Data Fig. 5a and Supplementary Table 4). 
During PGC development, H3K9mez2 levels are reduced”°, potentially 
due to repression of the H3K9 methyltransferase EHMT1 via a pres- 
ently unknown mechanism”’. Here, knockout of Prdm14 or Cbfa2t2 
in mES cells caused derepression of Ehmt1 (Extended Data Fig. 5f). 
Quantitative analysis showed a specific increase in H3K9me2 and 
H3K9me3 levels in Prdm14-/~, Cbfa2t2~-/~ and Cbfa2t2-m7 mutant 
mES cells (Extended Data Fig. 5g). Importantly, CBFA2T2-PRDM 14- 
mediated repression was required to maintain appropriate levels of 
H3K9me2 in PGCs in vivo. PGCs in E8.0 Cbfa2t2~/~ embryos exhib- 
ited a specific EHMT1 derepression (Fig. 4h), with resultant increased 
H3K9me2 levels at E8.75 (Fig. 4i and Extended Data Fig. 5h, i). Thus, 
direct control of global levels of chromatin modifications is probably 
another mechanism by which PRDM14 and CBFA2T2 regulate the 
delicate balance between self-renewal and lineage specification (Fig. 4)). 

In summary, CBFA2T2, a co-repressor protein, is a novel factor 
regulating pluripotency and is essential for germline development. In 
contrast to the long-held notion that co-repressors have a passive role 
in transcription factor recruitment, CBFA2T2, without intrinsic DNA- 
binding capacity, is required to stabilize both PRDM14 and OCT4 on 
chromatin via its oligomerization. While PRDM14 and OCT4 may 
independently bind DNA, their affinity-based ‘on rate’ is insufficient 
to generate a functional regulatory influence on transcription. Instead, 
CBFA2T2 oligomerization provides a larger interaction surface to limit 
its ‘off rate’ from chromatin, allowing for stable transcription factor 
binding (Fig. 4j). Such a model may extend to numerous transcription 
factors for which associated co-repressors or co-activators have yet to 
be identified. 

Note added in proof: During the revision of this manuscript, another 
study utilized an in vitro differentiation system to determine the 
involvement of CBFA2T2 in PGC formation, in accordance with our 
in vivo findings”®. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. For mouse studies, 
no randomization or blinding was done. 

Cell lines and cultures. NCCIT cells (CRL-2073) were obtained from ATCC 
and grown in RPMI1640 media with 10% FBS, L-glutamine, penicillin/ 
streptomycin and sodium pyruvate. 293Trex cells (ThermoFischer, #R710-07) were 
grown in DMEM with 10% FBS, L-glutamine and penicillin/streptomycin. KH2 
ES cells, described previously'®, were grown in DMEM supplemented with 15% 
FBS, L-glutamine, penicillin/streptomycin, non-essential amino acids, 0.1 mM 
3-mercaptoethanol, LIF and 2i inhibitors (11M MEK1/2 inhibitor (PD0325901) 
and 341M GSK3 inhibitor (CHIR99021)). On feeder conditions, 2i was omitted. 
Human fibroblast BJ cells were obtained from ATCC and maintained on fibroblast 
medium: DMEM knockout media with 10% FBS sera, 1% non-essential amino 
acids, L-glutamine, penicillin/streptomycin, non-essential amino acids and 0.1 mM 
3-mercaptoethanol. Human iPS cell culture medium contains advanced DMEM/ 
F12 plus 20% Knockout Serum Replacement, L-glutamine, penicillin/streptomycin, 
non-essential amino acids, 0.1 mM $-mercaptoethanol plus 10 ng ml! FGF2 
(Peprotech). Cell lines were verified by western blots and PCR, and tested for 
mycoplasma contamination. 

To generate PRDM14 or CBFA2T2 NCCIT stable lines, CAG-eGFP vector”? 
was used to clone N-terminal Flag-tagged and C-terminal HA-tagged target 
gene constructs. After 2 weeks puromycin selection, single colonies were picked 
and expanded. Similarly, GAL4-PRDM14 293 Trex stable line was generated 
by transfecting pcDNA4-TO plasmid (ThermoFisher, #V 1020-20) with an 
N-terminal GAL4, C-terminal HA fusion PRDM14 construct. For Prdm14- or 
Cbfa2t2-knockout KH2 lines, gRNAs were cloned into pSpCas9 (BB)-2A-eGFP 
vector (Addgene, px458)?°. For Prdm14 knockout, the gRNA sequence was: 
GCGATGGCCTTACCGCCCTC. For Cbfa2t2 knockout, two gRNAs were used: 
ACTCTCTTGGCAGGCGGTTC and CTGGCCCCCAGGATTCATAA. For 
Cbfa2t2 m7 knock-in lines, a gXNA sequence, AGAGAAAACTAGGCGCTCCA, 
targeting the NHR2 domain was chosen for cloning into the Cas9 vector. For this 
knock-in, a donor 723 bp gBlock DNA centred at the NHR2 domain sequence 
was PCR amplified and purified. Mouse KH2 ES cells were transfected with the 
above Cas9-gRNA-eGFP plasmids with Lipofecatamine 2000. In the case of m7 
knock-in, the 723 bp template was included. Medium-to-high GFP population was 
FACS sorted and seeded at 20,000 cells per 15-cm plate. Seven days later, ES-cell 
single colonies were selected, expanded and genotyped. 

Antibodies. Human PRDM14 antigen (N-terminal residue 1-243) was gen- 
erated by using PreScission protease to cleave the recombinant fusion protein 
GST-PRDM14(1-243). Rabbit polyclonal antibody was affinity purified by affi- 
gel 15 matrix conjugated with a His,-tagged antigen of PRDM14(1-243). Mouse 
PRDM14 antibody was described previously. Briefly, the N-terminal construct 
(amino acids 1-231) of mouse PRDM14 was cloned into pET30a vector (Novagen, 
#69909-3). The corresponding Hisg-tagged protein was overexpressed and purified 
for rabbit polyclonal antigen production. The antibody was purified with Affi-gel 
15 as mentioned earlier. In-house rabbit EZH2 antibody was reported previously”. 
Other antibodies used in this study were from commercial sources with the follow- 
ing catalogue numbers: anti-CBFA2T2, Bethyl A303-593A; anti-OCT4, Santa Cruz 
#sc-5279; anti-Ring1B, Bethyl #A302-869A; anti-SUZ12, Cell Signaling #37378; 
anti-HA, Abcam #ab9110; anti-tubulin, Abcam #ab6046; anti- TRA-1-81 biotin, 
eBiosceince #13-8883-82; Alexa Fluor 660 Streptavidin, LifeTechnologies s21377; 
HRP Streptavidin, Biolegend #405210; MVH, Abcam #ab13840. 

Nuclear extracts, immunoprecipitation and affinity purification. Nuclear 
extracts were prepared with buffer A and buffer C, essentially as described*’. 
Cytosol fraction was removed by buffer A (20 mM Tris, pH 7.9, 10mM KCl, 
0.5mM dithiothreitol (DTT), 0.2mM PMSF, 11g ml! Pepstatin A, 1 1g ml! 
Leupeptin, 1j1gml~! Aprotinin). The pellet was resuspended in buffer C (20 mM 
HEPES, pH 7.5, 20% glycerol, 420 mM NaCl, 0.5mM DTT, 0.2mM EDTA, 0.2mM 
PMSF, 1g ml"! Pepstatin A, 1j1gml~! Leupeptin, 1 1g ml~! Aprotinin) and snap 
frozen with liquid nitrogen. For immnuoprecipitation, 2-5 1g antibody was incu- 
bated overnight with 0.8 mg nuclear extract and immobilized on 401] of protein 
A:protein G beads (3:1 volume ratio). After 6 washes with BC350 (20 mM Tris, pH 
7.9, 350mM NaCl, 0.1% NP40, 0.2 mM PMSF, 0.2mM DTT), the immunoprecip- 
itate was separated by SDS-PAGE for western blot analysis. For loading controls, 
5% of input nuclear extract was used. For Flag affinity purification, 10 mg nuclear 
extract was incubated with 10011 Flag M2 beads overnight, and washed six times 
with BC350, as described earlier. Immunoprecipitate was eluted with 0.2 mg ml! 
Flag peptide in BC100 (20 mM Tris, pH 7.9, 100 mM NaCl, 0.2 mM PMSE, 0.2mM 
DTT). The Flag eluate was run into an SDS-PAGE gel for 1 cm. The upper gel slices 
containing proteins were excised and subjected to trypsin digestion and liquid 
chromatography-mass spectrometry (LC-MS) analysis. Digested peptides were 
desalted and concentrated using C18 stagetips for LC-tandem MS (LC-MS/MS) 
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analysis. One hundred and twenty minute gradients (6-75% acetonitrile) were 
used (nanoLC1000, Thermo Scientific) and spectra were recorded on an Orbitrap 
Velos (Thermo Scientific) by selecting the 15 most intense precursor ions for 
fragmentation in each full scan. 

PRDM14 and CBFA2T2 knockdowns. For PRDM14-knockdown experiments, the 
following shRNA sequences from Open Biosystems TRC pLKO.1 shRNA libraries 
were used: human PRDM14 shRNA 1: TTCTGTAGTGTCCATAGGACG; human 
PRDM14 shRNA 2: AACATGAAGAATGTGGATCCG; human PRDM 14 shRNA 
3: TTGAAGGGAGTCTTTATCCAG. 

Lentiviruses from these shRNAs, as well as empty pLKO.1 vector were pro- 
duced from 293T cells. Four million 293T cells were seeded on a 10-cm plate. 
Next day, 2.3 1g plasmid (shRNA or control), 1.6 1g psPAX2, and 1.1 jug pMD2.G 
2nd generation packaging plasmids (Addgene, #12260, #12259) were transfected 
with Lipofectamine 2000. Forty eight hours and sixty hours post-transfection, 
supernatants were collected. Viral particles were filtered through 0.45 1m filters 
and enriched 100-fold by centrifugation at 20,000 r.p.m. for 1.5h. For transduction 
of NCCIT cells by lentivirus, 0.20 million cells per well were seeded in 6-well plates. 
The next day, 20 11 virus was transduced with polybrene at a final concentration of 
8 gm. Forty eight hours post-tranduction, 1 ,,gml~' Puromycin was added to 
select for transduced NCCIT cells. Transduced cells were expanded and harvested 
in 1 week for ChIP-qPCR. 

For CBFA2T2-knockdown experiments, the following On-TARGETplus siRNA 
sequences from Thermo Scientific were used: GAUCAUCGUUUGACAGAAA; 
CAGAUUCUCUCAGCAAUGA; UAGAGGAUAUUGCAACUUC; CCACAG 
AGAUUCAGCAAUG. 

Qiagen AllStars negative siRNA was used as control. siRNA (final concentration, 

10nM) was transfected with 411 Lipofectamine RNAiMAX (Life Technologies) in 
each well of 6-well plates. Seventy hours post-transfection, cells were harvested for 
western blot analysis. For ChIP chromatin preparation, CBFA2T2 siRNA trans- 
fection was scaled up to two 10-cm plates. Cells were split once before harvesting 
chromatin. 
ChIP-qgPCR and ChIP-seq. ChIP was done as described with biological 
replicates*’, Briefly, cells were crosslinked with 1% formaldehyde, lysed, and 
sonicated in buffer 3 (10 mM Tris, pH 7.9, 1mM EDTA, 0.5mM EGTA, 0.5% 
N-lauroylsarcosine) down to a desired chromatin size. Forty microlitres of 3:1 
mixture of protein A and protein G Sepharose beads were blocked with 0.1 mgm]! 
salmon sperm DNA and 1 mg ml"! bovine serum albumin (BSA). Approximately 
100 1g chromatin was pre-cleared by half of the blocked beads, and incubated 
with 2-5 1g antibody overnight in Tris buffer (10 mM Tris, 10 mM EDTA, 1% 
Triton X-100, 0.1% sodium deoxycholate (DOC) and protease inhibitors). After 
4h incubation with the remaining protein A/G beads, samples were washed six 
times with RIPA buffer (0.5 M LiCl, 50 mM HEPES, pH 7.5, 1 mM EDTA, 1% 
NP-40, 0.7% DOC and protease inhibitors). After a brief wash with TE buffer 
(10 mM Tris, 1 mM EDTA, 50mM NaCl), samples were resuspended in 200 il 
of TsoE10S1 (50 mM Tris, pH 8.0, 10 mM EDTA, 1% SDS) and incubated at 65°C 
overnight to reverse crosslinks. Samples were digested at 55°C for 3h with 101g 
each of RNase A and proteinase K. Digested samples were PCR column purified 
and diluted into 300 1l water for qPCR. For qPCR quantification, 5 sl SYBR Green 
I Master mix (Roche), ROX reference dye, 3,11 water, 1 jl 5\1M primer pair, and 
1,1 DNA were mixed for PCR amplification. For GAL4 ChIP-qPCR, the primer 
sequences were: GAL4ChIPLucP5F: CACCGAGCGACCCTGCATAAGC; and 
GAL4ChIPLucP5R: GCTTCTGCCAACCGAACGGAC. Other qPCR primers 
are listed in Supplementary Table 3. 

For PRDM14, CBFA2T2 and OCT4 ChIP-Seq, 40 1 protein G beads per immuno- 
precipitation were used, and salmon sperm DNA was omitted in the blocking 
step. DNA was eluted in 301 elution buffer for library construction as described 
later**. 

RNA-seq. Mouse KH2 cells and knockout lines were grown on feeders for three 
consecutive passages on 6-well plates with FBS sera media supplemented with LIF. 
After the 4th passage (day 5), feeder cells were removed by trypsinizing and then 
plating the cells on a T25 flask for 35 min. The unattached ES cells were spun down 
and lysed with TRIzol. Standard RNA-seq procedure was used™, 

Library construction. Libraries for ChIP-seq were prepared according to man- 
ufacturer’s instructions (Illumina). Briefly, immunoprecipitated DNA (~5 ng) 
was end-repaired using End-It Repair Kit (Epicentre), tailed with deoxyadenine 
using Klenow exo- (New England Biolabs), and ligated to custom adapters with T4 
Rapid DNA Ligase (Enzymatics). Fragments of 200-400 bp were size-selected using 
Agencourt AMPure XP beads, and subjected to PCR amplification using Q5 DNA 
polymerase (New England Biolabs). Libraries were quantified by qPCR using prim- 
ers annealing to the adaptor sequence and sequenced at a concentration of 12 pM 
on an Illumina HiSeq. Barcodes were used for multiplexing. For RNA-seq libraries, 
polyA+ RNA was isolated using Dynabeads Oligo(dT)25 (Invitrogen) and 
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constructed into strand-specific libraries using the (UTP method. Once dUTP- 
marked double-stranded cDNA was obtained, the remaining library construction 
steps followed the same protocol as described earlier for ChIP-seq libraries. 
Bioinformatic analysis. Sequenced reads from ChIP-seq experiments were 
mapped to the hg19 or mm9 genome with Bowtie, using the parameters -v2 and 
-m4. Normalized genome-wide read densities were computed using a custom 
script and visualized on the UCSC genome browser after extending to the esti- 
mated size of a ChIP fragment (~200 nt). Enriched regions (ERs) were identi- 
fied using MACS 1.40rc2 with default parameters and an input control, and then 
filtered for ERs with at least 10 tags and an unadjusted P value of <1 x 10~** for 
ChIPs performed in NCCIT cells, and <1 x 10° for ChIPs performed in KH2 mES 
cells. ERs were associated to gene targets using the HOMER tool. Heatmaps were 
generated using a custom code in which reads were mapped to non-overlapping 
10-bp bins within peak-centred windows of 5-10 kb. Normalized cumulative read 
density across these bins is depicted. All Gene Ontology analysis was performed 
using the DAVID tool®. 

RNA-seq reads were assigned to genes using DEGseq (R package)** and the 
ENSEMBL annotation. FDR-adjusted P values for differential gene expression were 
calculated with DEseq (R package). Genes were considered to be ‘differentially 
expressed’ if their adjusted P value was <0.0001. 

Human iPS cell reprogramming. PRDM14 and CBFA2T2 lentiviral constructs 
were cloned into pHAGE-EF1a-IRES-td tomato and pHAGE-EF1a-IRES-zGreen 
vectors, respectively. OKSM polycistronic vector under EF1a control was used 
for the reprogramming experiments*”. Lentiviruses were produced as described 
previously*®. Supernatants were collected every 12h on two consecutives days start- 
ing 48h after transfection. Viral particles were concentrated by centrifugation at 
20,000 r.p.m. for 1.5h. Virus titre was quantified by flow cytometry and immuno- 
fluorescence, and high-titre viruses (greater than 60% transduction efficiency) were 
chosen for further human iPS cell reprogramming experiments. Human fibroblast 
BJ cells were seeded at 0.14 million cells per well of a 6-well plate. Twenty-eight 
hours after seeding, 10-1511 of concentrated OKSM, OKSM plus PRDM14, or 
OSKM plus CBFA2T2 virus combinations were used to transduce human fibro- 
blasts with polybrene (final concentration of 8 1g ml‘). Forty hours post-trans- 
duction, tdTomato and GFP-positive cells were sorted and seeded onto irradiated 
mouse embryonic fibroblast (MEF) feeders (GLOBASTEM). Cells undergoing 
reprogramming were maintained on human fibroblast medium for the first week, 
and transferred into 100% human iPS culture medium at the end of the second 
week. At the end of the third week on feeders, human iPS cells were tested for live 
TRA-1-81 staining. Anti-human TRA-1-81 (Podocalyxin) Biotin solution (1:200 
dilution in 4% FBS PBS solution) was directly added to each well. After washing, 
fluorescent secondary antibody Alexa Fluor 660 Streptavidin (1:200) was used for 
imaging. For colony counting, TRA-1-81 staining with secondary antibody con- 
jugated with HRP was used with substrate DAB (Vector Labs, #SK-4100)**. Error 
bars are based on three biological replicates of each condition. 
Cbfa2t2-knockout mice. We generated Cbfa2t2-knockout mice via zygotic 
injection”*. T7-gRNA DNA template was PCR amplified from one of the 
Cas9-gRNA plasmids for generation of Cbfa2t2-knockout ES cells. The gRNA 
sequence is ACTCTCTTGGCAGGCGGTTC. The primer sequences are 
TTAATACGACTCACTATAGGGAGAATGGACTATAAGGACCACGAC and 
GCGAGCTCTAGGAATTCTTAC. Subsequently, T7-gRNA was generated by 
in vitro transcription with MEGAshortscript T7 kit. Similarly, Cas9 mRNA was 
generated with mMESSAGE mMACHINE T7 ULTRA kit (Life Technologies). 
Injection mix contained 100 ngyl~' Cas9 mRNA, 50ng il’ gRNA. Cytoplasmic 
injection was performed on 102 C57BL/6 zygotes. Of those, 72 embryos were 
transferred to three pseudopregnant female mice. A total of 30 pups were born 
and genotyped. Genotyping primers were TAGCAGTCTTCCTGCTTTGG and 
CTTCTCGGTGTTCTAGCATCTT. Ten top potential off-target sites were tested 
by PCR sequencing. The ten primer pair sequences are listed in Supplementary 
Table 3. Crossing CRISPR mutant mice containing one allele of indel mutation 
with wild-type C57BL/6 mice resulted in Cbfa2t2*/~ mice. Intercrossing of these 
mice produced full Cbfa2t2-knockout mice (Cbfa2t2~/~). Mouse studies were 
approved by the New York University Medical Center Institutional Animal Care 
and Use Committee. 

Sperm counts and motility analysis and in vitro fertilization. Individual caudal 
epididymis was minced in 90,11 MBCD medium. After 30 min incubation at 37°C, 
sperm were separated by pipetting and passaging through a 70-|.m filter. Sperm 
counts and motility assessment were performed by using the DRM-600 CELL-VU 
Sperm Counting Cytometer. For in vitro fertilization (IVF), 28 egg donors were 
used in each experiment. 

Tissue staining of sections. Mutant testes were weighed before fixation. Ovaries 
and testes were fixed in Bouin's fixative for 2-6 h, washed with PBS overnight, 
dehydrated with ethanol solution, embedded in paraffin and sectioned at 51m. 


Sections were stained by H&E. PO testes were fixed with 4% PFA for 15 min. Slides 
of 10-\um cryosections were stained with MVH antibody (1:250). 

Whole-mount immunofluorescence analysis and alkaline phosphatase staining. 
Embryo isolation and staging were done as described previously’. The immunoflu- 
orescence analysis and alkaline phosphatase staining were performed essentially 
as described previously’. The primary antibodies used were as follows: anti-AP-2+ 
rabbit polyclonal, 1:500 (catalogue no. sc-8977; Santa Cruz Biotechnology); anti- 
ETMT1/GLP mouse monoclonal, 5 1g ml! (catalogue no. PP-B0422-00; R&D 
Systems); anti-H3K9me2 mouse monoclonal, 1:500 (catalogue no. ab1220; 
Abcam); anti-SOX2 goat polyclonal, 1:200 (catalogue no. sc-17320, Santa Cruz 
Biotechnology). The following secondary antibodies from Molecular Probes 
were used at a dilution of 1:500: Alexa Fluor 488 goat anti-rabbit IgG; Alexa Fluor 
568 goat anti-mouse IgG. The stained embryos were mounted with Vectashield 
Antifade Mounting Medium (catalogue no. H-1000; Vector Laboratories). The 
immunofluorescence images and the alkaline phosphatase staining images were 
taken by a confocal microscope (Zeiss LSM880) and a stereomicroscope (Leica 
M80), respectively. The image analyses were done by using ImageJ/Fiji software. 
Histone modification quantification. Histones from mouse KH2 and knock- 
out mutant ES cells were purified by acid extraction*®. Approximately 100 1g 
histones were derivatized with propionic anhydride. The reaction was repeated 
two times and then trypsinized. The newly formed N termini were then derivat- 
ized with propionic anhydride twice. The resulting peptides were purified with 
C18 stage-tip for MS analysis. Desalted histone peptides (1 jg) were then loaded 
onto and separated by reversed-phase high-performace LC (HPLC) on a Thermo 
Scientific EASY-nLC 1000 system with a 75 1m i.d. x 15cm (internal diame- 
ter and length) Reprosil-Pur C18-AQ 3 jm nanocolumn run at 300nl min“!. 
Peptides were eluted with a gradient from 2% to 30% ACN (35 min) and to 98% 
ACN over 20 min in 0.1% formic acid. The HPLC was coupled to a Thermo 
Scientific Orbitrap Elite Hybrid Ion Trap-Orbitrap mass spectrometer. In each 
cycle, one full MS Orbitrap detection was performed with the scan range of 290 
to 1,400 m/z, a resolution of 60 K and AGC of 1 x 10°. Then, data-dependent 
acquisition mode was applied with a dynamic exclusion of 30s. MS2 scans were 
followed on parent ions from the most intense ones. Ions with a charge state 
of one were excluded from MS/MS. An isolation window of 2 m/z was used. 
Ions were fragmented using collision induced dissociation (CID) with a collision 
energy of 35%. Iontrap detection was used with normal scan range mode and 
normal scan rate. The resolution was set to be 15 K with AGC of 1 x 10*. Targeted 
scans were performed on a number of peptides to increase the identification of 
low-abundance modifications. Histone PTM quantification was performed by 
using in-house-developed software EpiProfile*’. 
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Extended Data Figure 1 | Biochemical interaction between PRDM14 indicated endogenous proteins in mES cells. d, Western blot of Superose 6 
and CBFA2T2. a, Mass spectrometry peptide counts from Flag affinity column fractionation of Flag-purified CBFA2T2 complex in NCCIT cells 
purification from NCCIT control cells and stable lines expressing stably expressing Flag-HA-CBFA2T2. e, ChIP analysis using the indicated 
Flag-HA-PRDM14 (PRDM14-F), and Flag~HA-CBFA2T2 (CBFA2T2-F). —_ antibodies in 293T-REx harbouring a UAS-TK-Luciferase transgene. 
b, Characterization of in-house human PRDM14 antibody. Western Fold enrichment represents the ratio of enrichment by ChIP-qgPCR 
blot performed using 301g of NCCIT and KH2 mES cell lysate. Human upon induction of GAL4-PRDM14 expression via doxycycline addition. 
PRDM14 antibody is specific and does not cross-react with mouse Positions of the primer set are indicated by small arrows in the schematic. 
PRDM14. c, Immunoprecipitation (IP) using antibodies against the qPCR source data are included in the Supplementary Information. 
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Extended Data Figure 2 | PRDM14 and CBFA2T2 exhibit an genome browser tracks depicting SERs at the indicated genomic loci. 
overlapping and interdependent distribution on chromatin in NCCIT c, Gene Ontology (GO) analysis of PRDM14 and CBFA2T2 common 
cells. a, Heat map depicting PRDM14, CBFA2T2, RINGIB and SUZ12 target genes. d, Western blot analysis of PRDM14 and CBFA2T2 protein 
read density across a 5-kb window centred about the PRDM14 (top) or levels in knockdown (KD) experiments (Fig. 1d, e). 
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Extended Data Figure 3 | Characterization of knockout ES cell mutants 
and quantification of human iPS cell reprogramming efficiency. 

a, b, Strategy for generating Prdm14- and Cbfa2t2-knockout (KO) mES 
cells via CRISPR-Cas9 genome editing. Sequencing chromatograms 
confirming homozygous disruption of the locus are depicted. c, Cbfa2t2- 
and Prdm14-knockout ES cells require 2i to maintain growth. ES cell 
lines generated under FBS plus LIF plus 2i conditions were continuously 
cultured in FBS plus LIF plus 2i (top, middle), or switched to FBS plus LIF 
(bottom). Eight days after 2i withdrawal (FBS plus LIF), well-formed ES 
cell colonies were undetectable; instead, mutant ES cells appeared to be 
differentiated. Scale bar, 100m. d, Proliferation rates of wild-type 

(WT) and mutant knockout ES cells as described in c. Data were obtained 
from three biological replicates. Please note error bars shown in the plots. 
Owing to the logarithmic scale used here, some error bars are very small 
and might be invisible. e, RNA-seq MA plot (log ratio (M) versus mean 
average (A)) in the indicated ES cells. Data are representative of three 
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biological replicate experiments for each line. Mean abundance is plotted 
on the x axis and enrichment (both in log) scale) is plotted on the y axis. 
Genes depicted in red are differentially expressed with a FDR < 0.0001. 

f, Heat map showing relative expression of all differentially expressed 
genes as described in Fig. 2c. The only difference is now the heat map 

is centred on CBFA2T2 differentially expressed genes, rather than 
PRDM14 differentially expressed genes. g, Scheme of human fibroblast 
reprogramming to iPS cells. Fibroblasts were transduced with lentiviruses 
expressing polycistronic OCT4/KLF4/SOX2/c-MYC (OKSM) and 

either PRDM14 or CBCFA2T2. Three weeks later, bright-field images of 
successfully reprogrammed colonies (left) and live TRA-1-81 staining 
(right) were recorded. Scale bar, 500 1m. h, Quantification of human iPS 
cell reprogramming efficiency based on TRA-1-81 staining with secondary 
antibody conjugated with horseradish peroxidase (HRP) and substrate 
DAB. Error bars are based on four biological replicates of each condition. 
The source data are included in the Supplementary Information. 
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Extended Data Figure 4 | Cbfa2t2—/~ mouse genotypes and sperm 
defects. a, One representative Cbfa2t2~/~ mouse genotype wherein a 7-bp 
fragment is deleted. b, Testes of multiple wild-type (n = 4) and Cbfa2t2~/— 
(n=4) male mice at 8 weeks old were dissected and weighed. c, Number 
of sperm in the epididymis of Cbfa2t2*/* (n= 4) and Chfa2t2~/~ (n=4) 
mice is shown with standard error of the mean. P value was determined 
by Student’s t-test. d, Near loss of gonocytes in Cbfa2t2-knockout mutant 
PO testes by DDX4 (MVH) staining. Visualization of MVH-positive (red) 
gonocytes in Cbfa2t2*/~ (top) or Cbfa2t2~'~ (bottom) testis at PO stage. 
The merged images with Hoechst (left; white) are shown on the right. 
Scale bars, 100 mm. e, Numbers of AP-27-positive PGCs in Cbfa2t2t/* 
(black), Cbfa2t2*’~ (grey) and Cbfa2t2~/~ (red) embryos at the indicated 
embryonic stages. LS, late-streak stage; EB, MB, and LB, early-, mid-, and 
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late-bud stage; EHF, early-head fold stage; 2 st., 2 somites stage. Student’s 
t-test: *P = 0.03, **P= 0.003. f, Numbers of AP-2)-positive PGCs in 
Cbfa2t2*’* (black), Cbfa2t2*’~ (grey) and Cbfa2t2~/~ (red) embryos 

at the indicated embryonic stages. 0B, zero-bud stage; LHE, late-head 
fold stage. g, Left, expression of SOX2 (red) in AP-2-positive (green) 
PGCs in Cbfa2t2*’* (top) or Cbfa2t2~/~ (bottom) embryo at mid-bud 
stage, E7.25, shown as z-projection images of posterial confocal sections. 
Arrow indicates a minor PGC with relatively normal activation of SOX2. 
Scale bar, 50mm. Right, percentage of SOX2-positive (red) cells in 
AP-2--positive (green) PGCs in the indicated genotypes of Cbfa2t2 at 
E7.0-7.25 (zero- to mid-bud stage) are shown with statistical significance 
(Student's t-test: *P = 0.0006, **P = 0.0001; Cbfa2t2*’*, n=7; Chfa2t2*’-, 
n=5; Chfa2t2-’~,n=5). 
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Extended Data Figure 5 | Cbfa2t2-m7 mutant characterization and OCT4 at selected OCT4 target genes. Occupancy is compared between 
the related mechanism. a, Gene Ontology (GO) analysis of PRDM14 wild-type, Cbfa2t2-knockout and Cbfa2t2-m7 mES cells. ChIP-qPCR 
ChIP-seq target genes. PRDM14 target genes are enriched in histone primer sequences are included in Supplementary Table 3. f, RT-qPCR 
methyltransferase activities by DAVID functional annotation tool analysis. quantification of Enmt1 mRNA levels in wild-type and mutant lines. 
b, Cbfa2t2-m7 mutant genotyping. The mutant 7 amino acids are in red P values are 0.004 (**) and 0.0142 (*).The source data are included in 
and corresponding wild-type (WT) residues are highlighted in blue in the Supplementary Information. g, Mass spectrometry quantification of 
the displayed protein sequences. c, Bright-field images of wild-type and histone H3K9 modifications in wild-type and mutant lines. P values are 
Cbfa2t2-m7 mES cells. Scale bar, 100 |1m. d, Western blot analysis of 0.00956, 0.04248 (*). The source data are included in the Supplementary 
PRDM14, CBFA2T2 and OCT4 protein levels in Prdm14-knockout Information. h, i, Additional immunofluorescence analysis of H3K9me2 
(KO), Cbfa2t2-knockout or m7 mutant ES cells under feeder-free FBS (red) of AP-2+-positive (green; arrowheads) PGCs in Cbfa2t2*’/~ and 
plus LIF plus 2i condition. Nonspecific bands are denoted with an Cbfa2t2~'~ embryos at E8.75 as described in Fig. 4i. 
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Dissecting direct reprogramming from fibroblast to 
neuron using single-cell RNA-seq 


Barbara Treutlein!**, Qian Yi Lee)?**, J. Gray Camp', Moritz Mall**, Winston Koh!, Seyed Ali Mohammad Shariati®, 
Sopheak Sim?*, Norma F. Neff!, Jan M. Skotheim®, Marius Wernig*“s & Stephen R. Quake!”:8g 


Direct lineage reprogramming represents a remarkable 
conversion of cellular and transcriptome states!-3, However, 
the intermediate stages through which individual cells progress 
during reprogramming are largely undefined. Here we use single- 
cell RNA sequencing*” at multiple time points to dissect direct 
reprogramming from mouse embryonic fibroblasts to induced 
neuronal cells. By deconstructing heterogeneity at each time 
point and ordering cells by transcriptome similarity, we find that 
the molecular reprogramming path is remarkably continuous. 
Overexpression of the proneural pioneer factor Ascl1 results in a 
well-defined initialization, causing cells to exit the cell cycle and 
re-focus gene expression through distinct neural transcription 
factors. The initial transcriptional response is relatively 
homogeneous among fibroblasts, suggesting that the early steps 
are not limiting for productive reprogramming. Instead, the 
later emergence of a competing myogenic program and variable 
transgene dynamics over time appear to be the major efficiency 
limits of direct reprogramming. Moreover, a transcriptional state, 
distinct from donor and target cell programs, is transiently induced 
in cells undergoing productive reprogramming. Our data provide 
a high-resolution approach for understanding transcriptome states 
during lineage differentiation. 

Direct lineage reprogramming bypasses an induced pluripotent stage 
to directly convert somatic cell types. Using the three transcription 
factors Ascl1, Brn2 and Myt1l (BAM), mouse embryonic fibroblasts 
(MEFs) can be directly reprogrammed to induced neuronal (iN) cells 
within 2 to 3 weeks at an efficiency of up to 20%®. Several groups have 
further developed this conversion using transcription factor combi- 
nations that almost always contain Ascl1 (refs 9-12). Recently, one of 
our groups showed that Ascl1 is an ‘on target’ pioneer factor initiating 
the reprogramming process", and inducing conversion of MEFs into 
functional iN cells alone, albeit at a much lower efficiency compared 
to BAM". These findings raised the question whether and when a 
heterogeneous cellular response to the reprogramming factors occurs 
during reprogramming and which mechanisms might cause failure 
of reprogramming. We hypothesized that single-cell RNA sequencing 
(RNA-seq) could be used as a high-resolution approach to reconstruct 
the reprogramming path of MEFs to iN cells and uncover mechanisms 
limiting reprogramming efficiencies*!>'®. 

In order to understand transcriptional states during direct conver- 
sion between somatic fates, we measured 405 single-cell transcriptomes 
(Supplementary Data 1) at multiple time points during iN cell repro- 
gramming (Fig. la and Extended Data Fig. la). We first explored how 
individual cells respond to Ascll overexpression during the initial phase 
of reprogramming. We analysed day 0 MEFs and day 2 cells induced 


with Ascl1 only (hereafter referred to as Ascll-only cells) using PCA 
and identified three distinct clusters (A, B, C), which correlated with the 
level of Ascl1 expression (Fig. 1b-e). Cluster A consisted of all control 
dO MEFs and a small fraction of day 2 cells (~12%) which showed 
no detectable Ascll expression, suggesting these day 2 cells were not 
infected with the Ascll virus. This is consistent with typical Ascl1 
infection efficiencies of about 80-90%. We found that the day 0 MEFs 
were surprisingly homogeneous, with much of the variance due to cell 
cycle (Extended Data Fig. 1b-g, Supplementary Data 3, Supplementary 
Information). Cluster C was characterized by high expression of Ascll, 
Ascl1-target genes (Zfp238, Hes6, Atoh8 and so on) and genes involved 
in neuron remodelling, as well as the downregulation of genes involved 
in cell cycle and mitosis (Fig. 1c, e, fand Supplementary Data 2). 
Cluster B cells represent an intermediate population that expressed 
Ascll at a low level, and were characterized by a weaker upregulation 
of Ascll-target genes and less efficient downregulation of cell cycle 
genes compared to cluster C cells. This suggests that an Ascll expres- 
sion threshold is required to productively initiate the reprogramming 
process. In addition, we found that forced Ascll expression resulted in 
less intracellular transcriptome variance, a lower number of expressed 
genes (Fig. 1d) and a lower total number of transcripts per single cell 
(Extended Data Fig. 2a, b). Notably, the distribution of average expres- 
sion levels per gene was similar for all experiments independent of 
Ascll overexpression (Extended Data Fig. 2c). We observed that the 
upregulation of neuronal targets and downregulation of cell cycle genes 
in response to Ascll expression are uniform, indicating that the initial 
transcriptional response to Ascl1 is relatively homogenous among all 
cells (Fig. le). This suggests that most fibroblasts are initially competent 
to reprogram and later events must be responsible for the moderate 
reprogramming efficiency of about 20%. 

To explore the effect of transgene copy number variation on the 
heterogeneity of the early response, we analysed single-cell tran- 
scriptomes of an additional 47 cells induced with Ascl1 for two days 
from secondary MEFs derived via blastocyst injection from a clonal, 
Ascll-inducible embryonic stem cell line. As expected, the induction 
efficiency of Ascl1 was 100% since the secondary MEFs are genetically 
identical and all cells carry the transgene in the same genomic location 
(Fig. 1g). Nevertheless, these clonal MEFs had similar transcriptional 
responses and heterogeneity as primary infected MEFs at the day 2 time 
point, as well as comparable reprogramming efficiencies and maturation 
(Extended Data Fig. 3a). Finally, we compared the early response in our 
Ascll-only single-cell RNA-seq data with our previously reported bulk 
RNA-seq data of Ascll-only and BAM-mediated reprogramming” 
(Extended Data Fig. 3b). We found similar downregulation of MEF- 
related genes and upregulation of pro-neural marker genes in both 
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Figure 1 | Ascl1 overexpression elicits a homogeneous early response 
and initiates expression of neuronal genes. a, Mouse embryonic 
fibroblasts stably integrated with neuronal reporter Tau-eGFP® were 
directly transformed to neuronal cells through overexpression of a single 
(Ascl1), or three factors (Brn2, Ascl1, Myt1l; BAM) as described®. Cells 
were sampled using single-cell RNA-seq at day 0 without infection (d0, 

73 cells), day 2 (d2, 81 cells Ascl1-infected and 47 cells clonal), day 5 (d5, 
55 cells, eGFPt and eGFP™ cells), day 20 (d20, 33 cells, eGFP* cells), 

and day 22 (d22, 73 cells, eGFP* cells) post-induction with Ascll. As 

a comparison, cells reprogrammed using all three BAM factors were 
analysed at 22 days (d22, 43 cells, eGFP* cells). b, c, PCA of single-cell 
transcriptomes from day 0 MEFs (circle, 73 cells) and day 2 Ascll-induced 
cells (square, 81 cells) shows reduced intercellular variation at day 2. Points 
are coloured based on hierarchical clustering shown in e (b), or Ascl1 
expression (c). d, Left, distribution of transcriptome variance within single 
cells grouped by cluster assignment of b and e shows that Ascl1 expression 
reduces the intracellular transcriptome variance. Right, distribution 

of total number of genes expressed by single cells grouped by cluster 
assignment shows that Ascll overexpression reduces the range of gene 
expression. e, Hierarchical clustering of day 0 and day 2 cells (rows) using 
the top 50 genes (columns) correlating positively (genes I) and negatively 
(genes II) with PC1. Cells are clustered into three clusters (left sidebar): 

A (83 cells, MEFs), B (20 cells, intermediates), C (51 cells, day 2 induced 
cells). f, Top gene ontology enrichments of genes I and II (d) are shown 
with Bonferroni-corrected P values. BP, biological process; CC, cellular 
component; reg. exc. memb. pot., regulation of excitatory postsynaptic 
membrane potential. g, Distribution of PC1 loadings are shown for day 2 
cells carrying variable numbers of Ascl1 transgene copies (dark green, 
Ascll-infected) or carrying the same Ascll copy number and genomic 
location (yellow, clonal). PC1 effectively separates un-induced MEFs 
(cluster A) from induced cells highly expressing Ascll-target genes 
(cluster C) and both, Ascl1-infected and clonal cells, productively initiate 
reprogramming. The induction efficiency is higher for clonally induced 
MEFs, however even in the clonal population Ascl1 induction is variable. 


Ascll- and BAM-mediated reprogramming. These data suggest that 
the overexpression of Ascl1 focuses the transcriptome and directs the 
expression of target genes. 

We next analysed the transcriptomes of reprogramming cells on day 5. 
At this time point, the first robust Tau-eGFP signal can be detected 
in successfully reprogramming cells and we therefore purified 
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40 Tau-eGFP* and 15 Tau-eGFP~ cells for transcriptome analysis 
by fluorescence-activated cell sorting. We found that Tau-eGFP~ 
cells lacked expression of neuronal Ascl1-target genes (genes B), and 
maintained expression of fibroblast-associated genes (genes A and C; 
Fig. 2a, b, Extended Data Fig. 4a, Supplementary Data 4). In addition, 
we found a positive correlation (R? = 0.49) between Ascll expres- 
sion and Tau-eGFP intensities (Extended Data Fig. 4b, Fig. 2a, b). 
Quantitative real-time (qRT)-PCR and western blot analysis of Ascl1 
expression on day 5 to day 12 Tau-eGFP-sorted cells validated a 
significant decrease in Ascl1 expression in Tau-eGFP~ cells compared 
to Tau-eGFP* cells (Fig. 2c, Supplementary Data 5). Thus, Ascll 
expression is correlated to Tau-eGFP levels and expression of neu- 
ronal genes at day 5. This raises the hypothesis that Ascl1 is silenced 
in cells that fail to reprogram. Alternatively, cells with low or no Ascl1 
expression at day 5 and day 22 might have never highly expressed 
Ascl1. To distinguish between these two mechanisms, we used live cell 
microscopy to track cells over a time course from 3-6 days after Ascl1 
induction using an eGFP-Ascll fusion construct (Fig. 2d, Extended 
Data Fig. 5). We immunostained the cells at day 6 using Tujl antibod- 
ies recognizing the neuronal 33-tubulin (Tubb3) to identify cells that 
differentiated towards neuronal fate. We found that transgenic Ascl1 
protein levels varied substantially over time and, on average, contin- 
ued to increase over time in Tujl* cells, but decreased or plateaued 
in Tujl~ cells, leading to a significant difference in Ascll expression 
within six days of Ascll induction (Fig. 2e, Extended Data Fig. 4c). 
This time-lapse analysis demonstrated that Ascl1 is silenced in many 
cells that fail to reprogram. 

We next analysed the maturation events occurring during late 
reprogramming stages. We performed principal component analysis 
(PCA) on the single-cell transcriptomes of all reprogramming stages 
analysed, including day 22 cells reprogrammed with Ascll alone or 
with all three BAM factors (Extended Data Fig. 6a). PC1 separated 
MEFs and early time points (day 2, day 5) from most of the day 22 cells. 
Surprisingly, PC2 separated most day 22 BAM cells from day 22 Ascl1- 
only cells despite robust Tau-eGFP expression in both groups. We used 
t-distributed stochastic neighbour embedding (tSNE) to organize 
all day 22 cells into transcriptionally distinct clusters, and identified 
differentially expressed genes marking each cluster (Fig. 3a). We identified 
3 clusters, which contained cells expressing neuron (Syp), fibroblast 
(Eln), or myocyte (Tnnc2) marker genes, respectively (Fig. 3b). 
Consistent with this marker gene expression, cells in each cluster had 
a maximum correlation with bulk RNA-seq data from purified neurons, 
embryonic fibroblasts, or myocytes (Fig. 3c). Neuron- and myocyte-like 
cells expressed a clear signature of each cell type (Fig. 3d). Although we 
observed cells with complex neuronal morphologies in the Ascl1l-only 
reprogramming experiments as we had reported previously'* (Fig. 3e), 
their frequency was too low to be captured in the single-cell RNA-seq 
experiments. All of the day 22 Ascl1-only cells, and 33% of BAM cells 
had a highest correlation with myocytes or fibroblasts. 

We applied an analytical technique based on quadratic programming 
to quantify fate conversion and to predict when during reprogram- 
ming the alternative muscle program emerges (Extended Data Fig. 6b). 
This method allowed us to decompose each single cell’s transcrip- 
tome and express each cell’s identity as a linear combination of the 
transcriptomes from the three different observed fates (neuron, MEF, 
myocyte; Supplementary Data 6). Using this method, we observed that 
there is an initial loss of MEF identity concomitant with an increase in 
neuronal and myocyte identity over the first five days of Ascll repro- 
gramming. The neuronal identity is maintained and matures in day 22 
cells transduced with BAM (Extended Data Fig. 6c). However, the day 
22 Ascll-only cells failed to mature to neurons and adopted a predomi- 
nantly myogenic transcriptional program. This divergence was already 
apparent in some day 5 cells (Extended Data Fig. 6d, e). These findings 
raised the question whether the additional two reprogramming factors 
Brn2 and Myt1l suppress the aberrant myogenic program. Compatible 
with this notion, we observed that Brn2 and Myt1] had low expression 
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Figure 2 | Transgenic Ascl1 silencing explains early reprogramming 
failure. a, Hierarchical clustering of day 5 cells using genes correlating 
positively and negatively with PC1 and PC2 from PCA of day 5 Ascll-only 
cells. Note that eGFP fluorescence intensity and Ascll mRNA expression 
shown in the left side bar appear correlated. b, Violin plots show the 
distribution of Ascl1 and neuronal marker Tubb3 in day 0 MEFs, as well 
as Tau-eGFPt and Tau-eGFP~ day 5 cells. c, RT-PCR for exogenous 
Ascll expression (top, n = 4, biological replicates) and western blot of 
Ascl1 protein levels (bottom, Supplementary Data 5 ) for unsorted control 
MEFs and day 2 cells (NA, not applicable), as well as day 5, day 7, day 10 
and day 12 cells FAC-sorted using Tau-eGFP as a neuronal marker. Both 


in the five day 22 BAM cells that expressed a myogenic program. To 
directly address this question, we infected MEFs with Ascl1 alone 
or in combination with Brn2 and/or Myt11 and assessed myogenic 
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Figure 3 | iN cell maturation competes with an alternative myogenic cell 
fate that is repressed by Brn2 and MytlI. a, tSNE reveals alternative cell fates 
that emerge during direct reprogramming. Shapes and colours indicate the 
day 20/22 Ascll-only (dark green) or day 22 BAM-induced (blue) cells. Note 
that all cells are Tau-eGFP*. b, c, tSNE plot from a with cells coloured based 
on expression level of marker genes (b), or correlation with bulk RNA-seq 
data from different purified cell types (neurons”*, myocytes”, fibroblasts’; c). 
d, Heat map showing expression of genes marking the two alternative fates 

in day 20/22 Ascll-only (upper sidebar, dark green) and day 22 BAM (upper 
sidebar, blue) Tau-eGFP* cells. Genes (rows) have the highest positive and 
negative correlation with the first principal component in a PCA analysis on 
all day 20/22 cells and all genes. Columns represent 121 single cells, ordered 
based on their correlation coefficient with the first principal component. 
Lower sidebars, Ascl1 transcript level and Tau-eGFP fluorescence for each 
cell. e, Immunofluorescent detection of Tau—eGFP (green), DAPI (blue), 
Myh3 (red) and Tubb3 (cyan) for day 22 cells infected with Ascl1 alone, or 
with all BAM factors. Images are representative of four biological replicates. 
Right, mean fractions of eGFP* cells that express either Tubb3 or Myh3. Only 
Tubb3* cells with a neuronal morphology were counted. Six or seven images 
were analysed for each of four biological replicates. Error bars, s.e.m. 
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RNA and protein levels of Ascl1 are significantly higher in Tau—eGFP* 
cells, and gradually decrease in Tau-eGFP~ cells (*P < 0.05, **P< 0.01, 
*** D < 0.001, two-tailed t-test; error bars, s.e.m.). d, Schematic for live 
cell imaging experiment. CD1 MEFs were infected with an eGFP-Ascl1 
construct at —1 day, induced with doxycycline at day 0, switched to N3 
media at day 1 and imaged between 3 and 6 days post doxycycline. Cells 
were fixed at 6 days and stained for Tubb3 expression. e, Average eGFP- 
Ascl1 intensity (error bars, s.e.m.) was plotted at 45-min intervals for 
Tujl* (n= 10) and Tujl~ (n= 12) cells between day 3 and day 6. Tujl* 
cells significantly (one-tailed t-test, P< 0.05) increased Ascll expression 
through time compared to Tujl~ cells, which appeared to silence Ascl1. 


and neurogenic fates at day 22 based on immunostaining and 
qRT-PCR (Fig. 3e, Extended Data Fig. 6f-i). Indeed, myocyte mark- 
ers (Myh3, Myo18b, Tnnc2) were upregulated in Tau—eGFP-positive 
versus negative cells and were strongly repressed when Brn2 and/or 
Mytll was overexpressed together with Ascll. Moreover, Brn2 and 
Mytl] enhanced the expression of the synaptic genes Gria2, Nrxn3, 
Stmn3, and Snap25 but not the immature pan-neuronal genes 
Tubb3 and Map2. As expected, fibroblast markers were repressed in 
Tau-eGFP* cells. 

We next set out to reconstruct the reprogramming path from 
MEFs to iN cells. By deconstructing heterogeneity at each time point 
as described above, we removed cells that appeared stalled in repro- 
gramming due to Ascll silencing or cells converging on the alternative 
myogenic fate. We used quadratic programming to order the cells based 
on fractional similarity to MEF and neuron bulk transcriptomes. This 
revealed a continuum of intermediate states through the 22-day repro- 
gramming period (Fig. 4a, b). Notably, the total number of transcripts 
per single cell decreased as a function of fractional neuron identity 
(Extended Data Fig. 7a). Our ordering of cells based on fractional iden- 
tities correlated well with pseudotemporal ordering using Monocle’*, 
an alternative algorithm for delineating differentiation paths (Extended 
Data Fig. 7b-d). Heat map visualization of genes identified by PCA 
of all cells on the iN cell lineage revealed two gene regulatory events 
during reprogramming with many cells at intermediate stages (Fig. 4c, 
Supplementary Data 7). First, there is an initiation stage where MEFs 
exit the cell cycle upon Ascll induction, and genes involved in mitosis 
are turned down or off (such as Birc5, Ube2c, Hmga2). Concomitantly, 
genes associated with cytoskeletal reorganization (Sept3/4, Coro2b, 
Ank2, Mtap1a, Homer2, Akap9), synaptic transmission (Snca, Stxbp1, 
Vamp2, Dmpk, Ppp3ca), and neural projections (Cadm1, Dner, KIhI24, 
Tubb3, Mapt (‘Tau)) increase in expression. This indicates that Ascll 
induces genes involved in defining neuronal morphology early in the 
reprogramming process. The initiation phase is followed by a matura- 
tion stage whereby MEF extracelluar matrix genes are turned off and 
genes involved in synaptic maturation are turned on (Syp, Rab3c, Gria2, 
Syt4, Nrxn3, Snap25, Sv2a). These results are consistent with previous 
findings that Tujl* cells with immature neuron-like morphology can 
be found as early as three days after Ascl1 induction, while functional 
synapses are only formed 2 to 3 weeks into the reprogramming pro- 
cess®. Finally, we constructed a transcription regulator network on the 
basis of pairwise correlation of transcription regulator expression across 
all stages of the MEF-to-iN cell reprogramming. This revealed three 
densely connected sub-networks identifying transcription regulators 
influencing MEF cell biology, iN cell initiation, and iN cell maturation 
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Figure 4 | Reconstructing the direct reprogramming path from 

MEFs to iN cells. a, Top, for each cell on the iN cell reprogramming 

path, the similarity to bulk RNA-seq from either MEFs! or neurons” 
was calculated using quadratic programming and plotted as fractional 
identities (left axis, circle, fractional MEF identity; right axis, triangle, 
fractional neuron identity). Points are coloured based on the experimental 
time point. Bottom, Lagrangian residuals of the quadratic programming 
for each single cell ordered based on their fractional identity as above. 
Points are coloured based on the experimental time point. b, Fractional 
neuron identities of all cells on the iN cell reprogramming path are shown 
as a function of the experimental time point. c, Ordering of single cells 
(rows) according to fractional neuron identity revealed a cascade of gene 
expression changes leading to neuronal identity. Genes (columns) with 
the highest positive and negative correlation to PC1 and PC2 are shown. 
Left sidebars, experimental time point (green/blue) and fractional neuron 


(Fig. 4d, Extended Data Fig. 8, Supplementary Data 8, Supplementary 
Information). Notably, Ascl1 was found to strongly positively correlate 
with the transcription regulators in both the initiation and maturation 
subnetworks and negatively correlate with transcription regulators spe- 
cific to MEFs. This data corroborates evidence that persistent Ascl1 
expression is required to maintain chromatin states conducive to iN 
cell maturation’. 

It has been suggested that direct somatic lineage reprogramming 
may not involve an intermediate progenitor cell state as seen dur- 
ing induced pluripotent stem cell differentiation!”-!°. However, our 
fractional analysis showed that the identity of intermediate repro- 
gramming cells could not be explained by a simple linear mixture of 
the differentiated fibroblast and neuron identities, as revealed by an 
intermediary increase of Lagrangian residuals (Fig. 4a). Therefore, 
we tested whether a neural precursor cell (NPC) state is transiently 
induced by adding NPC bulk transcriptome data along with that of 
MEFs and neurons into the quadratic programming analysis (Fig. 4e). 
We found that the fractional NPC identity of cells increased specif- 
ically for cells at intermediate positions on the MEF-to-iN cell line- 
age path, and then decreased as a function of iN cell maturation. In 
addition, several NPC genes (that is, Gli3, Sox9, Nestin, Fabp7, Hes1) 
are expressed in intermediates of the iN cell reprogramming path”° 
(Fig. 4f). However, canonical NPC marker genes such as Sox2 and 
Pax6 were never induced. This indicates that cells do not go through a 
canonical NPC stage, yet a unique intermediate transcriptional state is 
induced transiently that is unrelated to donor and target cell program 
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identity (yellow/red). Right sidebars, Ascl1 transcript levels (log,[FPKM], 
blue/yellow) and eGFP fluorescence intensities (logi9[RFU], black/ 

white; RFU, relative fluorescence units). d, Transcriptional regulator 
covariance network during iN cell lineage progression. Shown are nodes 
(transcriptional regulators) with more than three edges, with each 

edge reflecting a correlation >0.25 between connected transcriptional 
regulators. e, Fractional MEF (left axis) or fractional neural precursor cell 
(NPC) identities (right axis) are plotted against fractional neuron identity 
for single cells on the MEF-to-iN cell lineage. Points are shaped based 

on the experiment. f, Expression of selected genes (columns) that mark 
NPCs, intermediate progenitor cells (IPCs), neurons, or proliferating cells 
(Prolif.) are shown for cells on the iN cell lineage (rows). Left sidebars, 
fractional neuron identity (yellow/red) and experimental time point 
(green/blue). 


similar to that which was observed for induced pluripotent stem cell 
reprogramming”! 

A fundamental question in cell reprogramming is whether there are 
pre-determined mechanisms that prevent the majority of the fibro- 
blasts from reprogramming or whether all donor cells are competent 
to reprogram but the reprogramming procedure is inefficient. We did 
not observe any MEF subpopulations, other than cell cycle variation, 
that suggested differences in the capacity to initiate reprogramming. 
Furthermore, we observed that 48 h after infection the majority of the 
cells induced Ascll-target genes and silenced MEF-associated genes. 
This does not preclude the possibility that underlying epigenetic vari- 
ation in donor cells influences reprogramming outcomes; however, our 
analysis suggests that it is unlikely that MEF heterogeneity contributes 
significantly to reprogramming efficiency. We found that divergence 
from the neuronal differentiation path into an alternative myogenic 
fate, as well as Ascl1 transgene silencing, were both significant factors 
contributing to reprogramming efficiency. Though Ascl1 induces lin- 
eage conversion, it is inefficient in restricting cells to the neuronal fate. 
This suggests that intermediate stages of iN cell progression are unsta- 
ble, perhaps due to epigenetic barriers, and additional factors promote 
cells to permanently acquire neuron-like identity, rather than revert 
to MEF-like or diverge towards the alternative myocyte-like fate. In 
summary, we present a single-cell transcriptomic approach that can 
be used to dissect direct cellular reprogramming pathways or devel- 
opmental programs in which cells transform their identity through a 
series of intermediate states. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Cell derivation, cell culture and iN cell generation. Tau-eGFP reporter MEFs, 
tested negative for mycoplasma contamination, were isolated, infected with doxy- 
cycline (dox)-inducible lentiviral constructs and reprogrammed into iN cells as 
previously described®. Day 0 (d0) cells were uninfected MEFs that served as a 
negative control. Day 2 (d2) cells were infected with Ascl1 and harvested two days 
after dox-induction. Day 5 (d5) cells were infected with Ascl1, FAC-sorted for Tau- 
eGFP* and Tau-eGFP cells five days after dox induction and the two cell popu- 
lations were mixed again in a 1:1 ratio. Day 20 or 22 (d20/d22) cells were infected 
either with Ascl1 alone, or combined with Brn2 and Mytll, plated with glia seven 
days post dox induction, and FAC-sorted for Tau-eGFP* iN cells 20 or 22 days 
after dox induction. Each of these groups was then loaded onto separate micro- 
fluidic mRNA-seq chips for preparation of pre-amplified cDNA from single cells. 

Clonal Ascl1-inducible MEFs were derived as previously described'*. Twelve- 
well plates were coated with Matrigel and incubated at 37°C overnight. 350,000 
cells were then plated per well and kept in MEF media. Dox was added a day after 
plating. For single-cell RNA-segq, cells were harvested two days post dox induction 
and loaded onto a microfluidic mRNA-seq chip. To evaluate efficiency in repro- 
gramming, MEF + dox media was switched out for N3 + dox media after 48 h, 
and cells were fixed for immunostaining 12 days post dox. 

Capturing of single cells and preparation of cDNA. Single cells were captured on 
a medium-sized (10-17 1m cell diameter) microfluidic RNA-seq chip (Fluidigm) 
using the Fluidigm C1 system. Cells were loaded onto the chip at a concentration 
of 350-500 cells il~!, stained for viability (live/dead cell viability assay, Molecular 
Probes, Life Technologies) and imaged by phase-contrast and fluorescence 
microscopy to assess number and viability of cells per capture site. For d5 and d22 
experiments, cells were only stained with the dead stain ethidium homodimer 
(emission ~635 nm, red channel) and Tau-eGFP fluorescence was imaged in the 
green channel. Only single, live cells were included in the analysis. cDNAs were 
prepared on chip using the SMARTer Ultra Low RNA kit for Illumina (Clontech). 
ERCC (External RNA Controls Consortium) RNA spike-in Mix (Ambion, Life 
Technologies)”°”” was added to the lysis reaction and processed in parallel to 
cellular mRNA. Tau-eGFP fluorescence intensity of each single cell was deter- 
mined using CellProfiler’® by first identifying the outline of the cell in the image of 
the respective capture site and then integrating over the signal in the eGFP channel. 
RNA-seq library construction and cDNA sequencing. Size distribution and con- 
centration of single-cell cDNA was assessed on a capillary electrophoresis based 
fragment analyser (Advanced Analytical Technologies) and only single cells with 
high quality cDNA were further processed. Sequencing libraries were constructed 
in 96-well plates using the Illumina Nextera XT DNA Sample Preparation kit 
according to the protocol supplied by Fluidigm and as described previously”. 
Libraries were quantified by Agilent Bioanalyzer using High Sensitivity DNA 
analysis kit as well as fluorometrically using Qubit dsDNA HS Assay kits and a 
Qubit 2.0 Fluorometer (Invitrogen, Thermo Fisher Scientific). Up to 110 single-cell 
libraries were pooled and sequenced 100 bp paired-end on one lane of Illumina 
HiSeq 2000 or 75 bp paired-end on one lane of Illumina NextSeq 500 to a depth of 
1-7 million reads. CASAVA 1.8.2 was used to separate out the data for each single 
cell using unique barcode combinations from the Nextera XT preparation and to 
generate *.fastq files. In total, the transcriptome of a total of 405 cells was measured 
from the following eight independent experiments: dO (73 cells, 1 experiment), d2 
(Ascl1-only in regular MEFs, 81 cells, 1 experiment; Ascl1-only in clonal MEFs, 
47 cells, 1 experiment), d5 (Ascll-only, 55 cells, 1 experiment) and d20 (Ascl1-only, 
33 cells, 1 experiment) and d22 (BAM, 43 cells, 1 experiment; Ascll-only, 34 and 
39 cells, 2 independent experiments). See Supplementary Data 1 for the transcrip- 
tome data for all 405 cells with annotations (quantification in log)[FPKM)]). 
Processing, analysis and graphic display of single cell RNA-seq data. Raw reads 
were pre-processed with sequence grooming tools FASTQC™, cutadapt*!, and 
PRINSEQ* followed by sequence alignment using the Tuxedo suite (Bowtie*’, 
Bowtie2**,TopHat** and SAMtools**) using default settings. Transcript levels 
were quantified as fragments per kilobase of transcript per million mapped reads 
(FPKM) generated by TopHat/ Cufflinks”. 

After seven days of reprogramming, Tau-eGFP reporter MEFs (with C57BL/6J 
and 129S4/SvJae background) were co-cultured with glia derived from CD-1 mice. 
To determine if any feeder cells contaminated the 20-22-day time points, we used 
the single cell RNA-seq reads to identify positions that differ from the mouse 
reference genome (mm10, built from strain C57BL/6 mice). We used the mpileup 
fuction in samtools to generate a multi-sample variant call format file (vcf), and a 
custom python script to genotype the cells by requiring coverage in all cells for all 
positions, with a coverage depth of five reads, a phred GT likelihood =0 for called 
genotype and >40 for next-best genotype. This resulted in 95 informative sites 


distinguishing more than one cell from the reference genome. We clustered cells 
based on their genotype (homozygous reference, heterozygous, homozygous alter- 
nate), and identified cells that were strongly different from the reference genome. 
These cells expressed either astrocyte (Gfap) or microglia marker genes suggesting 
they were contaminants from the feeder cell culture. We removed these cells from 
subsequent analyses. 

Approximate number of transcripts was calculated from FPKM values by using 
the correlation between number of transcripts of exogenous spike-in mRNA 
sequences and their respective measured mean FPKM values (Extended Data 
Fig. 2). The number of spike-in transcripts per single cell lysis reaction was calcu- 
lated using the concentration of each spike-in provided by the vendor (Ambion, 
Life Technologies), the approximate volume of the lysis chamber (10 nl) as well as 
the dilution of spike-in transcripts in the lysis reaction mix (40,000 x). Transcript 
levels were converted to the log-space by taking the logarithm to the base 2 
(Supplementary Data 1). R studio®” (https://www.rstudio.com/) was used to run 
custom R** scripts to perform principal component analysis (PCA, FactoMineR 
package), hierarchical clustering (stats package), variance analysis and to construct 
heat maps, correlation plots, box plots, scatter plots, violin plots, dendrograms, 
bar graphs, and histograms. Generally, ggplot2 and gplots packages were used to 
generate data graphs. 

The Seurat package*”*” implemented in R was used to identify distinct cell 
populations present at d22 of Ascll-only and BAM reprogramming (Fig. 3a—d). 
t-distributed stochastic neighbour embedding (tSNE) was performed on all 
d20/d22 cells using the most significant genes (P< 1 x 10-3, with a maximum of 
100 genes per principal component) that define the first three principal components 
of a PCA analysis on the data set. To further estimate the identity of each cell on 
the tSNE plot, we colour coded cells based on Pearson correlation of each single 
cell’s expression profile with the expression profile of bulk cortical neurons}*4, 
myocytes”*, and MEFs’? (Fig. 3). The Monocle package! was used to order cells 
on a pseudo-time course during MEF to iN cell reprogramming (Extended Data 
Fig. 7). Covariance network analysis and visualizations were done using igraph 
implemented in R"! (http://igraph.sf.net). 

To generate PCA plots and heat maps in Figs 1c-e, 2a, 3a and 4c, PCA was per- 
formed on cells using all genes expressed in more than two cells and with a variance 
in transcript level (log.[FPKM]) across all single cells greater than 2. This threshold 
resulted generally in about 8,000-12,000 genes. Subsequently, genes with the high- 
est PC loadings (highest (top 50-100) positive or negative correlation coefficient 
with one of the first one to two principal components) were identified and a heat 
map was plotted with genes ordered based on their correlation coefficient with the 
respective PC (Figs le, 2a, 4c). Cells in rows were ordered based on unsupervised 
hierarchical clustering using Pearson correlation as distance metric (Figs le, 2a) 
or based on their fractional identity as determined by quadratic programming 
(Fig. 4c, see below) 

Gene ontology enrichment analyses were performed using DAVID 
Bioinformatics Resources 6.7 of the National Institute of Allergy and Infectious 
Diseases**. Functional annotation clustering was performed and GO terms repre- 
sentative for top enriched annotation clusters are shown in Fig. 1f, Extended Data 
Figs le and 4a with their Bonferroni corrected P values. In addition, results of GO 
enrichment analyses are provided in the Supplementary Data. 

To express a single cell transcriptome as a linear combination of primary cell 
type transcriptomes, we used published bulk RNA-seq data sets for primary murine 
neurons~‘, myocytes”’, and embryonic fibroblasts'? (Extended Data Fig. 6b, c), 
neurons” and embryonic fibroblasts!* (Fig. 4a) or neurons”*, embryonic fibro- 
blasts! and neuronal progenitor cells! (Fig. 4e). In each quadratic programming 
analysis, we first identified genes that were specifically (log, fold change of 3 or 
higher) expressed in each of the bulk data sets compared to the respective others 
(Supplementary Data 6). Using these genes, we then calculated the fractional iden- 
tities of each single cell using quadratic programming (R package ‘quadprog’). The 
resulting fractional neuron identities of cells on the MEF-to-iN cell reprogram- 
ming path (265 cells in total, excluding cells that were Tau—eGFP-negative at d5 or 
myocyte- and fibroblast-like cells at d22) were used to order cells in a pseudo- 
temporal manner (Fig. 4a-c, e, f). We compared this fractional neuron identity 
based cell ordering with pseudo-temporal ordering of cells based on Monocle 
(Extended Data Fig. 7b-d), an algorithm that combines differential dimension 
reduction using independent component analysis with minimal spanning tree 
construction to link cells along a pseudotemporally ordered path’. Monocle 
analysis was performed using genes differentially expressed between neuron” and 
embryonic fibroblast!? bulk RNA-seq data (same gene set that was used when 
calculating fractional neuron and fibroblast identities in Fig. 4a, genes listed in 
Supplementary Data 6). 

For the transcription factor network analysis (Fig. 4d), we computed a pairwise 
correlation matrix (Pearson correlation, visualized in correlogram in Extended 
Data Fig. 8a) for transcriptional regulators annotated as such in the Animal 


39,40 


© 2016 Macmillan Publishers Limited. All rights reserved 


Transcription Factor Database (http://www.bioguo.org/AnimalTFDB/)* and 
identified those transcriptional regulators (TRs) with a Pearson correlation of 
greater than 0.35 with at least five other TRs (82 TRs, shown in Extended Data 
Fig. 8b). We used a permutation approach to determine the probability of finding 
TRs meeting this threshold by chance. We performed 500 random permutations 
of the expression matrix of all TRs across cells on the MEF-to-iN cell lineage, and 
calculated the pairwise correlation matrix for each permutation of the input data 
frame. All randomized data frames resulted in 0 TRs that met our threshold. This 
shows that our correlation threshold is strict, and all nodes and connections that we 
present in the TR network are highly unlikely to be by chance. We used the pairwise 
correlation matrix for the selected TRs as input into the function graph.adjacency() 
of igraph implemented in R" (http://igraph.sf.net) to generate a weighted network 
graph, in which the selected TRs are presented as vertices and all pairwise corre- 
lations >0.25 are presented as edges linking the respective vertices. The network 
graph was visualized using the Fruchterman-Reingold layout and the three clear 
subnetworks (MEF, initiation, maturation) were manually colour coded. 

We used Pearson correlation of each single cell expression profile with the 
expression profile of bulk cortical neurons!?4, myocytes”, and MEFs’ to further 
estimate the identity of each single cell and to estimate when alternative fates 
emerge (Fig. 3c, Extended Data Fig. 6d, e). For this analysis, we considered the same 
cell type specific gene sets that were used in the quadratic programming analysis, 
that is, were genes specifically expressed (log, fold change of 3 or higher) in a 
respective bulk RNA-seq data set compared to the others (Supplementary Data 6). 

To estimate intercellular heterogeneity of dO MEFs, we calculated the variance 
for each gene across all MEF cells as well as across mouse embryonic stem cells 
under 2iLIF culture conditions“ and across glioblastoma cells*°. We then plotted 
the distribution of variances for all genes per cell population as box plots. 
Quantitative RT-PCR and immunostaining. Ascl1 infected Tau-eGFP reporter 
MEFs were FAC-sorted 5, 7, 10, 12 or 22 days post-Ascll induction with dox. 
RNA was then extracted from both Tau-eGFP positive and negative populations 
from each time point, as well as uninfected control MEFs and unsorted d2 Ascl1- 
infected MEFs using the TRIzol RNA isolation protocol (Invitrogen, 15596-018). 
Reverse transcription into cDNA was performed using the SuperScript III First- 
strand Synthesis System (Invitrogen, 18080-051) and qRT-PCR was performed 
using Sybr Green (Thermo Fisher Scientific, 4309155). Immunostaining was per- 
formed as previously described®. Antibodies and qRT-PCR primers are listed in 
the Supplementary Information. 

Time-lapse imaging of Ascl1 expression. MEFs were isolated from E13.5 CD-1 
embryos (Charles River) and infected with a dox-inducible, N-terminal-tagged 
eGFP-Ascll fusion construct using the protocol previously described! Cells were 
plated on 35cm glass bottom dishes (MatTek), coated with polyorthinine (Sigma 
P3655) and laminin (Invitrogen 23017-015). Imaging experiments were performed 
between 3 and 6 days post dox induction, in a temperature- and CO>-controlled 
chamber. Images were taken for up to 10 positions per dish, for 3 dishes, every 
45 min with a Zeiss AxioVert 200M microscope with an automated stage using 
an EC Plan-Neofluar 5 x/0.16 NA Ph1 objective or an A-plan 10/0.25 NA Ph1 
objective. Cells were fixed at 6 days and immunostained using Tujl antibodies rec- 
ognizing neuronal Tubb3 (Covance MRB-435P) to confirm neuronal identity. We 
used Image] to segment individual cells and measure the level of GFP for 7 Tuj1* 
cells and 7 Tujl~ cells over time. Average intensity was obtained by normalizing 
the average intensity of a cell segment by the average background intensity of an 
adjacent segment of the same size. A t-test was performed comparing Tujl* and 
Tujl~ cells at each time point to evaluate significance. 

Antibodies. Rabbit anti-Ascll (Abcam ab74065), chicken anti-GFP (Abcam 
ab13970), rabbit anti-Tubb3 (Covance MRB-435P), mouse anti-Tubb3 (Covance 
MMS-435P), mouse anti-Map2 (Sigma M4403), rabbit anti-Myh3 (Santa 
Cruzsc-20641), goat anti-DIx3 (Santa Cruz sc-18143), mouse anti-3-Actin (Sigma 
A5441), rabbit anti-Tcf12 (Bethyl A300-754A). 

Primers. General. Gapdh (forward: AGGTCGGTGTGAACGGATTTG, 
reverse: TGTAGACCATGTAGTTGAGGTCA); Ascl1 (TetO) (forward: CCGAA 
TTCGCTAGCCACCAT, reverse: AAGAAGCAGGCTGCGGG). 

Initiation factors. Atoh8 (forward: GCCAAGAAACGGAAGGAGTGA, 
reverse: CTGAGAGATGGTACACGGGC); Dlx3 (forward: CGCCGCTCCAA 
GTTCAAAAA, reverse: GTGGTACCAGGAGTTGGTGG); Hes6 (forward: 
TACCGAGGTGCAGGCCAA, reverse: AGTTCAGCTGAGACAGTGGC); 
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Sox11 (forward: CCTGTCGCTGGTGGATAAGG, reverse: CTGCGCCTCTC 
AATACGTGA)); Sox9 (forward: CGAGCACTCTGGGCAATCTCA, reverse: 
ATGACGTCGCTGCTCAGTTC); Tcf4 (forward: CAGTGCGATGT 
TTTCGCCTC, reverse: ATGTGACCCAAGATCCCTGC); Tcf12 (forward: 
GTCTCGAATGGAAGACCGCT, reverse: GITTCCGACCATCGAAGCTGA). 
Maturation factors. Camtal (forward: CCCCTAAGACAAGACCGCAG, 
reverse: ACATAGCAGCCGTACAAGCA); Insm1 (forward: GACCCGG 
CACATCAACAAGT, reverse: GAAGCGAAGCGAAGAGGACA); Myt1l 
(forward: ATGTTCCCACAACCACACCA, reverse: TACCGCTTGGCATCG 
TCATA)); St18 (forward: TGCCAAGGGAGCTGAGATAGA, reverse: GAAGG 
CTGCTTGCGTTGAAT). 

Neuronal genes. Gria2 (forward: GGGGACAAGGCGTGGAAATA, 
reverse: GTACCCAATCTTCCGGGGTC); Map2 (forward: CAGAGAAA 
CAGCAGAGGAGGT, reverse: TTTGTTCTGAGGCTGGCGAT); Nrxn3 (forward: 
TGTGAACCAAGTACAGATAAGAGT, reverse: CAGCTCAGGGGAC 
AAAGAGG); Snap25 (forward: TTCATCCGCAGGGTAACAAA, 
reverse: GTTGCACGTTGGTTGGCTT); Stmn3 (forward: AGCACCGT 
ATCTGCCTACAAG, reverse: TGGTAGATGGTGTTCGGGTG); Tubb3 (forward: 
CAGATAGGGGCCAAGTTCTGG, reverse: GTTGTCGGGCCTGAATAGGT). 
Myocyte genes. Actal (forward: CTAGACACCATGTGCGACGA, reverse: 
CATACCTACCATGACACCCTGG); Myh3 (forward: AAATGAAGGGGACG 
CTGGAG, reverse: CAGCTGGAAGGTGACTCTGG); Myo18b (forward: 
TGCCCTCTTCAGGGAAGGTA, reverse: GAGCTTCTCCACTGACACC(C); 
Tnnc2 (forward: CAACCATGACGGACCAACAG, reverse: GTGTCTGCC 
CTAGCATCCTC). 

Fibroblast genes. Colla2 (forward: AGTCGATGGCTGCTCCAAAA, reverse: 
ATTTGAAACAGACGGGGCCA); Den (forward: GCAAAATCAGT 
CCAGAGGCA, reverse: CGCCCAGTTCTATGACAAGC). 
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Extended Data Figure 1 | The majority of MEFs are actively undergoing 
cell cycle, but exit cell cycle upon Ascl1 induction. a, Live cell imaging 
of Tau-eGFP reporter over the course of BAM-mediated iN cell 
reprogramming. Tau-eGFP fluorescence normalized to the maximum 
expression is shown in relation to days post-BAM induction. Tau-eGFP 
expression began at day 5 and reached a peak at day 8 after induction. 
Shown are representative images from day 0, day 5 and day 9. b, Box 
plots of intercellular transcriptome variance showed that MEFs are more 
heterogeneous than mouse embryonic stem cells under 2iLIF culture 
conditions“ and less heterogeneous than glioblastoma cells**. c, PCA 

of genes with most variance in day 0 MEFs revealed MEF heterogeneity 
(blue, A). Density plot showing the distribution of number of cells along 
PC1 loading is shown above the PCA plot. d, Heat map and hierarchical 
clustering of genes used for the PCA in panel c shows to major MEF 
subpopulations. Each column represents a single cell, and each row 

a gene. Subpopulation A is highlighted in blue in the dendrogram. 

e, GO enrichment for genes in c shows that MEF subpopulation A is 
distinguished by the low or lack of expression of genes enriched for cell 
cycle terms. f, g, PCA and heat map of the same genes used in panels 
c-e, this time including day 0 MEFs (circles, light green) and day 2 cells 
(squares, dark green), showed that most of the day 2 cells had the same cell 
cycle signature as MEF subpopulation A. Cells in columns of both heat 
maps are ordered based on PCI loading. 
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Extended Data Figure 2 | Total number of transcripts per cell decreases 
during MEF-to-iN cell reprogramming. a, Average detected transcript 
levels (mean FPKM, log») for 92 ERCC RNA spike-ins as a function 

of provided number of molecules per lysis reaction for each of the 

8 independent single-cell RNA-seq experiments. Linear regression fits 
through data points are shown. The length of each ERCC RNA spike-in 
transcript is encoded in the size of the data points. No particular bias 
towards the detection of shorter versus longer transcripts is observed. 
The linear regression fit was used to convert FPKM values to approximate 
number of transcripts. b, Box plots showing the distribution of the total 


number of transcripts per single cell for each experiment. Number of 
transcripts per cell were calculated from the FPKM values of all genes in 
each cell using the correlation between number of transcripts of exogenous 
spike-in mRNA sequences and their respective measured mean FPKM 
values (calibration curves are shown in panel a). The total number of 
transcripts expressed by a single cell and detected by single-cell RNA-seq is 
highest in MEFs and is more than twofold decreased upon overexpression 
of Ascll or BAM. c, Box plots showing the distribution of the median 
transcript number per gene across all cells of one experiment. The 
distributions are similar over the course of iN cell reprogramming. 
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Extended Data Figure 3 | Clonal MEFs reprogram successfully into iN 
cells, and Ascll-only and BAM induce similar responses during early iN 
cell reprogramming. a, Immunostaining of heterogenous Ascl1-infected 
MEFs and clonal MEFs with homogenous Ascl1 transgene insertions, 
fixed 12 days after Ascll induction, using rabbit anti-Tubb3 (red) and 
mouse anti-Map2 (cyan) antibodies and DAPI (blue) as a nuclear stain. 
Reprogramming efficiencies are comparable regardless of variation in 
Ascll copy numbers. Images are representative for one reprogramming 
experiment. b, Bar plots showing expression of Ascl1-target genes (Hes6, 
Zfp238, Snca, Cox8b, Bex1, Dner) and MEF marker genes averaged across 
single cells from day 0 MEFs and day 2 Ascll-only cells, as well as from 
bulk RNA-seq data from MEFs, day 2 BAM, and day 2 Ascl1-only cells. 
This data shows that the initiation of reprogramming at day 2 is similar for 
Ascll-alone and BAM-mediated reprogramming. 
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Extended Data Figure 4 | Failed reprogramming at day 5 correlates 
with silencing of Ascl1. a, Bonferroni-corrected P values for gene 
ontology enrichments are shown for each group of genes from Fig. 2a, with 
representative genes listed (Supplementary Data 4). b, Biplot showing 
Tau-eGFP fluorescence intensity as a function of Ascl1 transcript level 

in day 5 cells. Point size is proportional to eGFP transcript levels in 
log»[FPKM]. There is a positive correlation (R* = 0.49) indicating that cells 
with higher Ascll expression are more likely to reprogram. c, Heat map of 
eGFP-Ascll expression in 14 individual cells (columns) during live cell 
imaging. Rows represent time post Ascl1 induction in 45-min intervals. 
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Extended Data Figure 5 | Live cell imaging shows diminishing of one reprogramming experiment per condition. b, Representative images 
eGFP-Ascl1 signal in cells that fail to reprogram. a, Immunostaining from live cell imaging showing an example of diminishing of eGFP signal 
for Tubb3 and Map2 at day 12 post induction of Ascl1, C-terminal tagged in a cell that failed to reprogram (that is, cell was Tujl-negative at day 6). 
Ascll-eGFP and N-terminal tagged eGFP-Ascl1 in CD-1 MEFs. eGFP- c, Live cell imaging of eGFP signal of eGFP-Ascl1 infected MEFs between 
Ascll has comparable reprogramming efficiency with untagged Ascl1 3-6 days post dox induction. d, eGFP imaging of live cells 6 days post 
while Ascll-eGFP has a much reduced reprogramming efficiency, so induction of Ascll and corresponding immunostaining for Tubb3 after 


eGFP-Ascll was chosen for live cell imaging. Images are representative for _ fixation. 
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Extended Data Figure 6 | Brn2 and Mytl] repress alternative fates that 
compete with the iN cell fate during advanced Ascl1 reprogramming. 
a, Scatter plot showing PC1 and PC2 loadings from principal component 
analysis (PCA) of single cells from all time points with experimental time 
point and reprogramming condition (Ascl1 versus BAM) encoded in point 
shape and colour. b, Overview of quadratic programming. Fractional 
identities are calculated assuming a linear combination of different cell 
fates. c, Biplots showing the fractional fibroblast identity as a function of 
fractional neuron (left) and fractional myocyte (right) identity for each 
cell with points shaped and colour coded based on reprogramming time 
point and condition. d, Correlation of transcriptomes from days 0, 2, 5, 
and 20/22 cells (Ascll-only and BAM-induced) with bulk RNA-seq from 
MEFs, cortical neurons and myocytes. Bottom bars show Tau-eGFP 
fluorescence intensity. e, Bar plot quantifying the number of cells with a 
maximum correlation to bulk RNA-seq data from each of the observed 
fates (d). f, Immunofluorescent detection of Tau-eGFP (green), DAPI 
(blue), Myh3 (red) and Tubb3 (cyan) for day 22 cells that were infected 
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with Ascll co-infected with Brn2 or Mytll. See Fig. 3e for respective 

data for cells infected with Ascll-only or all three BAM factors. Images 
are representative for four biological replicates. Right, mean fractions of 
eGFP* cells that express either Tubb3 or Myh3. Only Tubb3* cells with 

a neuronal morphology were counted. Co-expression of Ascl1 with Brn2 
and/or Mytll increases fraction of Tau—eGFP* cells that are also Tubb3*, 
while decreasing the number of cells that are Myh3*. Six or seven images 
were analysed for each of four biological replicates. Error bars, s.e.m. 
g-i, qRT-PCR of selected myogenic (g), neuronal (h), and fibroblast (i) 
markers using day 22 cells that are infected with Ascl1 only or 
co-infected with Brn2 or Myt1] or both and FAC-sorted by Tau-eGFP 
(n= 3, biological replicates; error bars, s.e.m.). Myogenic genes were 
significantly downregulated in Tau-eGFP* cells that were co-infected 
with Brn2 and/or Myt1l compared to those infected with Ascl1 alone, 
while some neuronal genes are significantly upregulated (Map2, Gria) 
(*P< 0.05, **P < 0.01, ***P < 0.001, two-tailed t-test). 
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Extended Data Figure 7 | Comparison of Monocle and quadratic 
programming with respect to ordering of neuronal cells through the 
reprogramming path. a, Biplot showing the total number of transcripts 
per cell for all cells on the MEF-to-iN cell lineage as a function of the 
fraction neuron identity of each cell (see Fig. 4). The total number of 
transcripts decreases during the reprogramming process. b, Cells (depicted 
as circles) are arranged in the 2D independent component space based on 
the expression of genes used for quadratic programming in Fig. 4a. Lines 
connecting cells represent the edges of a minimal spanning tree with the 
bold black line indicating the longest path. Time points are colour coded. 
c, Monocle plots with single cells coloured based on gene expression that 
distinguishes the stages of iN cell reprogramming. d, Biplot shows the 
correlation between ordering of cells based on pseudo-time (Monocle) 

and fractional identity (quadratic programming). Time points are colour 
coded. Pearson correlation coefficient = 0.91. 
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Extended Data Figure 8 | Neuronal maturation proceeds through 
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expression of distinct transcriptional regulators. a, Correlogram 


showing transcriptional regulators (TRs) highly correlated within MEFs as 
well as the initiation phase and the maturation phase of reprogramming. 
b, Heat map shows expression of TRs that control the two stages of MEF 


to iN cell reprogramming (Fig. 4d) in cells ordered based on fractional 
neuron identity. Each row represents a single cell, each column a gene. 
Experimental time point (green/blue sidebar) and fractional neuron 


identity (yellow/red sidebar) are shown at the top. c-e, Pseudo-temporal 


expression dynamics of exemplary TRs marking the initiation stage (c) 
and the maturation stage (d) of iN cell reprogramming as well as MEF 


identity (e). Transcript levels of the TRs are shown across all single 
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cells on the MEF-to-iN cell lineage ordered based on fractional neuron 


identity. Growth curves based on a model-free spline method were fitted 
to the data. f, GRT-PCR of selected TRs from initiation and maturation 
subnetworks from Fig. 4d. Uninfected MEF controls and day 2-12 Ascl1- 
infected cells were assayed for all selected TRs, and day 22 Ascll-alone 
and BAM-infected cells were additionally assayed for maturation TRs. 
Cells for day 5 to day 22 samples were FAC-sorted into Tau-eGFP* and 
Tau-eGFP~ populations (n= 4 for all populations, biological replicates; 
error bars, s.e.m.). g, Western blot for selected TRs from the initiation 
subnetwork presented in panel b. 8-Actin was used as a loading control 
(Supplementary Data 8). 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


doi:10.1038/nature18300 


Systemic RNA delivery to dendritic cells exploits 
antiviral defence for cancer immunotherapy 
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Abderraouf Selmi!*, Andreas N. Kuhn?, Janina Buck?, Evelyna Derhovanessian®, Richard Rael, Sebastian Attig!?, 
Jan Diekmann’, Robert A. Jabulowsky°, Sandra Heesch’, Jessica Hassel, Peter Langguth®, Stephan Grabbe*, Christoph Huber)’, 
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Lymphoid organs, in which antigen presenting cells (APCs) are 
in close proximity to T cells, are the ideal microenvironment for 
efficient priming and amplification of T-cell responses'. However, 
the systemic delivery of vaccine antigens into dendritic cells (DCs) 
is hampered by various technical challenges. Here we show that DCs 
can be targeted precisely and effectively in vivo using intravenously 
administered RNA-lipoplexes (RNA-LPX) based on well-known 
lipid carriers by optimally adjusting net charge, without the need 
for functionalization of particles with molecular ligands. The 
LPX protects RNA from extracellular ribonucleases and mediates 
its efficient uptake and expression of the encoded antigen by DC 
populations and macrophages in various lymphoid compartments. 
RNA-LPX triggers interferon-c (IFNa) release by plasmacytoid 
DCs and macrophages. Consequently, DC maturation in situ and 
inflammatory immune mechanisms reminiscent of those in the early 
systemic phase of viral infection are activated”. We show that RNA- 
LPX encoding viral or mutant neo-antigens or endogenous self- 
antigens induce strong effector and memory T-cell responses, and 
mediate potent IFNa-dependent rejection of progressive tumours. 
A phase I dose-escalation trial testing RNA-LPX that encode shared 
tumour antigens is ongoing. In the first three melanoma patients 
treated at a low-dose level, IFNa and strong antigen-specific T-cell 
responses were induced, supporting the identified mode of action 
and potency. As any polypeptide-based antigen can be encoded 
as RNA**, RNA-LPX represent a universally applicable vaccine 
class for systemic DC targeting and synchronized induction of 
both highly potent adaptive as well as type-I-IFN-mediated innate 
immune mechanisms for cancer immunotherapy. 

DCs initiate immune responses in lymphoid tissues upon early 
sensing of infectious pathogens’. Previous work aimed at gene deliv- 
ery to DCs largely resorted to functionalization of nanoparticles with 
molecular ligands®**. Antigen-encoding RNA formulations have been 
used for local?~'! and systemic injection in various RNA vaccine stud- 
ies and resulted in antigen-specific T-cell responses, albeit with low 
antitumour activity!*'°. We engineered RNA-containing nanoparti- 
cles differing in their molecular characteristics, for example, carrier 
composition, charge ratio (lipid to RNA ratio) and ionic conditions, 
and then analysed particle size, colloidal stability, RNA integrity, 
free RNA and zeta potential!*. For in vivo testing, RNA nanopar- 
ticles encoding the reporter gene firefly luciferase (Luc-RNA) were 
injected intravenously (i.v.) into mice to assess biodistribution of the 
Luc signal. Whereas injection of naked Luc-RNA did not generate a 


reporter signal, several of the carrier-RNA formulations gave char- 
acteristic patterns of in vivo organ transfection, indicating protection 
and efficient translation of RNA (Extended Data Fig. 1a). Positively 
charged particles with higher in vitro transfection efficiencies com- 
pared to neutral or negatively charged compositions have previously 
been the focus of in vivo studies for nucleic acid delivery'*’*. To 
systematically evaluate the effect of overall particle charge on in vivo 
targeting of DCs, which has remained unexplored to date, we varied 
lipid:RNA ratios. Cationic liposomes composed of the broadly used 
lipids DOTMA and DOPE formed colloidally stable nanoparticu- 
late RNA-LPX of reproducible particle size (200-400 nm) and charge 
(Fig. la, Extended Data Fig. 1b) with positive as well as negative 
excess charge. Slightly positively charged and near-neutral RNA-LPX 
(positive to negative charge ratio from 2.5:1 to 1.8:2), in contrast, were 
unstable, forming large aggregates immediately after preparation. 
Positively charged Luc-RNA-LPX (charge ratio of 5:1), as typically 
used for gene delivery'®, targeted Luc expression predominantly in 
the lungs of mice and less in the spleen (Fig. 1b). Surprisingly, gradual 
decrease of the cationic lipid content shifted Luc expression from the 
lungs towards the spleen. Near-neutral and slightly negative particles 
(for example, charge ratio of 1.7:2) provided an exclusively splenic 
signal (Fig. 1b). RNA-LPX of further lowered charge ratio (<1.7:2) 
were medium-sized (~200-320 nm), of low polydispersity, and were 
all expressed in the spleen. However, the transfection efficiency grad- 
ually declined with increasing negative charge, probably owing to 
increasing amounts of uncomplexed free RNA (Extended Data Fig. 1c). 
The unexpected selective targeting of negatively charged particles 
to the spleen prompted us to test various other well-characterized 
lipid compositions (for example, DOTAP, cholesterol). Irrespective of 
the lipids used, all particles with an excess negative charge exhibited 
pharmacologically suitable physicochemical properties (Extended 
Data Fig. 1d) and led to selective antigen expression in splenic cell 
populations (Extended Data Fig. le). 

For further characterization, we selected an RNA-LPX formulation 
with a charge ratio of 1.3:2, which effectively targeted RNA to the spleen 
(Extended Data Fig. 1f), formed monodisperse and stable particles 
(Fig. 1c, Extended Data Fig. 1g) and was fully resistant to degradation 
by mouse serum at 37°C (Extended Data Fig. 1h). In CD11c-DTR 
mice (expressing diphtheria toxin receptor under control of the Cd11c 
promoter) depleted of CD11c* cells before injection of this RNA-LPX, 
the reporter signal was almost undetectable in the spleen, indicating 
that APCs are the source of Luc expression (Fig. 1d). 
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Goldgrube 12, Mainz 55131, Germany. “Department of Dermatology, University Medical Center of the Johannes Gutenberg University, Langenbeckstr. 1, Mainz 55131, Germany. Department 
of Dermatology, Heidelberg University Hospital, Im Neuenheimer Feld 440, 69120 Heidelberg, Germany. “Institute of Pharmacy and Biochemistry, Johannes Gutenberg University, Germany, 
Langenbeckstr. 1, Mainz 55131, Germany. ’Cluster for Individualized Immune Intervention, Kupferbergterasse 19, Mainz 55116, Germany. 
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Figure 1 | RNA-LPX of negative net charge deliver RNA-encoded 
antigens body-wide to lymphoid-resident DCs. a, Particle size, 
polydispersity index (top) and zeta potential (bottom) (n = 3) of 
RNA-LPX constituted with DOTMA/DOPE liposomes and Luc-RNA 
at various charge ratios. b, Bioluminescence imaging of BALB/c mice 
(n= 3) after i.v. injection of Luc-LPX at various charge ratios. Pie charts 
show relative contribution of each organ to total signal. c, Particle size 
and polydispersity index of Luc-LPX either constituted freshly or stored 
at 4°C followed by 24h incubation at room temperature (top) and 
bioluminescence imaging of the spleens of BALB/c mice (n= 8, pooled 
from two experiments) after iv. injection (bottom). d, Bioluminescence 
imaging after i.v. injection of Luc-LPX in CD11c-DTR mice (n= 3) 


We identified CD11c* conventional (c)DCs in the marginal zone, 
and plasmacytoid (p)DCs and macrophages in the spleen as the cell 
subsets internalizing RNA-LPX by a set of experiments in which 
RNA-LPX with Cy3- or Cy5-labelled RNA or enhanced green fluo- 
rescent protein (eGFP) was administered (Fig. le, f, Extended Data 
Fig. 2a, b). 

The rate of RNA uptake was highest in macrophages (Extended Data 
Fig. 2a), whereas the highest eGFP transfection rate as measure for 
translation efficiency was observed in cDCs (Fig. 1f), indicating that 
DCs are more effective in cytoplasmic translocation and translation of 
RNA. Natural killer (NK), B and T cells, in contrast, did not exhibit rele- 
vant uptake. Concordant with the fact that the spleen, as the organ with 
the highest density of APCs, is known to be highly efficient in clearance 
of blood-borne pathogens, we found that i.-v.-administered liposomal 
RNA was rapidly cleared from the blood within 1h (Extended Data 
Fig. 2c). 

Analysis of organs explanted after i-v. administration of RNA-LPX 
encoding Thy1.1 (enabling sensitive detection of Thy1.1* transfected 
cells in Thy1.2 mice) or containing Cy5-labelled reporter RNA, revealed 
that not only APCs in the spleen are targeted. In the liver, we detected 
Cy5-labelled RNA in a small portion of cells, and Thy1.1 expression in 
CD11bt macrophages (Extended Data Fig. 2d). Moreover, we detected 
Luc signals in lymph nodes from various body regions and in femur 
and tibia bone marrow (Fig. 1g, Extended Data Fig. 2e), as well as Cy5- 
labelled RNA and Thy1.1*CD11c* cells in the bone marrow (Extended 
Data Fig. 2f). Again, depletion of CD11c* cells before i-v. injection of 
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depleted (depl.) of CD11c* cells. e, Splenic localization of CD11c and 
Cy3 double-positive cells in BALB/c mice (n= 2) 1h after i-v. injection of 
Cy3-labelled RNA-LPX. Scale bar, 100 1m. MZ, marginal zone; RP, red 
pulp; WP, white pulp. f, eGFP expression in splenic cell subsets of C57BL/6 
mice (n= 3) 24h after iv. injection of eGFP-LPX by flow cytometry. 

g, Bioluminescence imaging of inguinal lymph nodes (LN), femur and 
tibia in BALB/c mice (n= 3) after iv. injection of Luc-LPX. LDL, lower 
detection limit. h, Bioluminescence imaging of inguinal lymph nodes and 
ex vivo Luc assay of bone marrow (BM) single-cell suspensions after i.v 
injection of Luc-LPX in CD11c* cell-depleted CD11c-DTR mice (n= 3). 
Significance was determined using unpaired two-tailed Student's t-test. 
Error bars, median (c, bottom), otherwise mean + s.d. 


Luc-LPX substantially reduced the reporter gene signal in these com- 
partments (Fig. 1h). 

We previously reported that DCs engulf naked RNA injected into 
lymph nodes by macropinocytosis!’”, which is constitutively active 
in immature DCs. RNA-LPX nanoparticles taken up by monocyte- 
derived human immature DCs almost completely co-localized with 
the macropinosome marker dextran (Extended Data Fig. 2g), whereas 
partial co-localization was observed with TLR7 and the early endosome 
marker EEA1 (Extended Data Fig. 2h). Moreover, rottlerin, a macropi- 
nocytosis inhibitor, and cytochalasin D, an inhibitor of phagocytosis 
and macropinocytosis, significantly inhibited RNA-LPX uptake by 
DCs in vitro (Extended Data Fig. 2i). Similarly, uptake of RNA-LPX 
was clearly reduced in vivo when lymph nodes were pre-injected with 
rottlerin (Extended Data Fig. 2j). DC maturation is known to prevent 
macropinocytosis’®, whereas phagocytosis and receptor-mediated 
endocytosis remain unaffected’’. Polyinosinic:polycytidylic acid (poly 
I:C)-matured DCs were unable to internalize Luc-RNA-LPX nanopar- 
ticles in vitro (Extended Data Fig. 2k). Similarly, in mice pre-treated 
with poly I:C before i.v. injection of RNA-LPX, the splenic Luc signal 
and reporter gene expression in CD8* and CD8~ cDCs were strongly 
reduced or completely lost (Extended Data Fig. 21). Altogether, these 
findings identify macropinocytosis as the major uptake mechanism 
of RNA-LPX. 

Investigating the biological effect of RNA-LPX in vivo, we found that 
a single i.v. injection of RNA-LPX encoding influenza virus hemag- 
glutinin (HA), but not an empty control liposome carrier, induced 
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Figure 2 | RNA-LPX vaccines induce TLR7-triggered IFNa production, 
IFNAR-dependent activation of APCs and effector cells, and strong 
expansion of fully functional antigen-specific T cells. a, b, Activation 
markers measured 24h after i.v. injection of HA-LPX by in splenic 
immune cell subsets (n = 3 per time point) and kinetics of IFNa serum 
levels (n =3 per time point) in wild-type (a) or TIr7—'~ (b) mice. c, IFNa 
serum levels in CD11c* cell-depleted CD11c-DTR mice (n = 3) after iv. 
injection of HA-LPX. d, Fraction of IFNa-expressing cells in splenic APC 
subsets after i.v. injection of HA-LPX in C57BL/6 and Ifnar1~'~ mice 

(n=3 per time point). e, Fraction of OVA-specific (left) and gp70-specific 
CD8* T cells (right) within CD8* T cells in blood after de novo priming 
of C57BL/6 mice (n=5) and BALB/c mice (n= 5) immunized i.v. with 
OVA-LPX or gp70-LPX (day 0, 3, 8), respectively. f, Kinetics of OVA- 
specific CD8* T cell frequencies within CD8* T cells in blood after i-v. 
immunization of C57BL/6 mice (n=5) with OVA-LPX. g, Fraction of 
OVA-specific CD8* T cells within CD8* T cells in blood of CD11c* 
cell-depleted BM-chimaeric CD11c-DTR mice (n = 5) immunized i.v. with 
OVA-LPX (day 0, 3). Significance was determined using two-way ANOVA 
and Bonferroni’s multiple comparisons test (a, right, b, right, d), one-way 
ANOVA and Tukey’s multiple comparisons test (d), and unpaired two- 
tailed Student's t-test (b, left, c, e, g). Error bars, mean +s.d. 


maturation of splenic pDCs, CD8* and CD8~ cDCs, which upregulated 
activation markers CD40 and CD86 (Fig. 2a, left). Activated CD11c* 
cells redistributed from the red pulp and marginal zone into the T-cell- 
rich white pulp within 6h after iv. injection of RNA-LPX (Extended 
Data Fig. 3a). NK, B, CD4* and CD8* T cells were also strongly acti- 
vated (Fig. 2a, middle), and a transient burst of serum IFNa peaking 
6h after RNA-LPX injection occurred (Fig. 2a, right). IFNa is typi- 
cally produced in the context of RNA virus infections by APCs sensing 
dsRNA and ssRNA via endosomal TLR3 and TLR7, respectively, and 
is crucial for an efficient inflammatory, antiviral environment”. In 
TIr7~'~ mice, as compared to C57BL/6 wild-type mice, we found that 
splenocytes were moderately activated after i.v. injection of HA-LPX 
(Fig. 2b, left), and systemic IFNa release was not fully abrogated but 
significantly lower (Fig. 2b, right). By testing Tir3/~, Tlr4'~ and 
TIr9~'~ mice, we excluded the contribution of these TLR signalling 
pathways and of DNA or LPS contamination to RNA-LPX-mediated 
effects (Extended Data Fig. 3b). 

In C57BL/6 wild-type mice, expression of activation markers on 
cDCs, pDCs, NK cells, as well as B and T lymphocytes, increased 
continuously after i.v. injection of HA-LPX over a time period of 
24h. In C57BL/6 mice lacking IFNo receptor 1 (Ifnar1~/~) and in 
BALB/c mice pre-treated with an anti-IFNAR1 blocking antibody, 
cDC, pDC, and NK-cell activation was significantly impaired and 
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restricted to the first 6h, whereas no activation of CD4* and CD8* 
T cells and B cells occurred (Extended Data Fig. 3c—e). In CD11c- 
DTR mice depleted of CD11c* cells before injection of HA-LPX, 
serum IFNa levels were markedly lower (Fig. 2c). These data support 
the role of type I IFN production and indicate that CD11c* cells are 
the cellular source. 

In C57BL/6 wild-type mice, splenic pDCs but not cDCs secreted 
IFNa immediately after RNA-LPX injection, which began to decrease 
1h after injection, whereas IFNa production by macrophages steadily 
increased over a period of 3h (Fig. 2d). In Ifnar1 —'~ mice, however, 
IFNa secretion by pDCs was moderately reduced, whereas mac- 
rophages did not produce IFNa. Expression profiling of sorted cells 
showed that macrophages and pDCs upregulate distinct IFNa isoforms 
(Extended Data Fig. 3f). Selective depletion of pDCs in BDCA2-DTR 
mice (expressing diphtheria toxin receptor under control of the Bdca-2 
promoter) or ablation of macrophages by pre-treatment with clodro- 
nate confirmed the role of these APC subsets for the TLR7-dependent 
biphasic IFNa production (Extended Data Fig. 3g). 

Next, we studied antigen-specific T-cell stimulation upon vacci- 
nation. A single iv. dose of HA-LPX induced strong proliferation of 
HA-specific T-cell receptor (TCR)-transgenic CD8* and CD4 T cells 
in blood, lymph nodes and spleen (Extended Data Fig. 3h). HA-specific 
T cells co-incubated with splenocytes from HA-LPX-treated mice 
ex vivo were strongly stimulated, indicating that HA-LPX delivered iv. 
is efficiently internalized in vivo by splenocytes for functional antigen 
presentation (Extended Data Fig. 3i). 

De novo priming of T-cell responses was analysed in C57BL/6 mice 
immunized with RNA-LPX encoding an ovalbumin epitope (OVA- 
LPX) and BALB/c mice immunized with gp70-LPX (an endogenous 
antigen of Moloney murine leukaemia virus integrated into the mouse 
genome). Three rounds of immunization with the respective RNA-LPX 
induced fully functional antigen-specific T cells reaching 30-60% of 
total CD8* T cells (Fig. 2e, Extended Data Fig. 3}). Notably, repeated 
vaccination with RNA-LPX prevented the typical post-expansion T-cell 
retraction phase and high frequencies of antigen-specific T cells were 
maintained over several weeks (Fig. 2f). Concordantly, re-challenge of 
primed mice with OVA-LPX induced profound CD8* T cell expan- 
sion, indicating the formation of memory cells (Fig. 2f, Extended Data 
Fig. 3k). Immune responses were not inducible in CD11c-DTR 
bone-marrow-chimaeric mice depleted of CD11c* cells (Fig. 2g), 
whereas splenectomized mice vaccinated with RNA-LPX mounted a 
diminished but strong T-cell response, indicating the importance of 
APCs and the contribution of DCs in lymphoid tissues other than the 
spleen to RNA-LPX-mediated immunity (Extended Data Fig. 31). 

The prophylactic efficacy of RNA-LPX vaccines was assessed in 
two subcutaneous (s.c.) tumour models, B16-OVA and CT26: in both, 
immunization with OVA-LPX or gp70-LPX, respectively, led to com- 
plete and long-lasting protection upon tumour challenge, whereas all 
untreated mice died within less than 30 days (Fig. 3a). 

Therapeutic efficacy was tested in several mouse tumour mod- 
els. In a B16-OVA lung metastasis model, tumour-bearing C57BL/6 
mice were immunized with three doses of OVA-LPX. The RNA-LPX- 
immunized mice cleared lung metastases completely and were free of 
tumours 20 days after the last immunization, whereas the lungs of mice 
immunized with control RNA exhibiting similar immune stimulatory 
properties were tumour-loaded (Extended Data Fig. 4a, b). In a lung 
metastases model with the melanocyte-differentiation antigen TRP-1 
as vaccine target, strong CD8* and CD4* T-cell responses against this 
self-antigen were induced, and growing B16F10-Luc tumours were fully 
rejected (Fig. 3b, Extended Data Fig. 4c). Likewise, lung metastases 
derived from Luc-transduced or wild-type CT26 tumour cells were 
efficiently eradicated by vaccination with gp70-LPX (Fig. 3c, Extended 
Data Fig. 4d). Viral oncogenes and mutant neoepitopes are increas- 
ingly under investigation as clinically relevant vaccine-target classes. 
Vaccination with viral oncogene-coding E6/E7-LPX was successful at 
treating mice bearing advanced HPV 16 E6- and E7-expressing TC-1 
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Figure 3 | RNA-LPX vaccines mediate rejection of advanced, 
aggressively growing tumours in mice. a, Prophylactic efficacy in 
OVA-LPX immunized C57BL/6 (n = 6) challenged s.c. with B16-OVA 
melanoma and gp70-LPX immunized BALB/c mice (n = 5) challenged and 
rechallenged s.c. with CT26 colon carcinoma. b, B16F10-Luc growth (left) 
and tumour load in lungs (right) of B6 albino mice (n = 12) immunized i.v. 
with TRP-1-LPX, irrelevant (empty vector)-LPX or control (untreated). 

c, CT26-Luc growth and CT26 tumour load in lungs of BALB/c mice 

(n= 4-7) immunized i.v. with gp70-LPX. d, Survival of C57BL/6 mice 
(n= 10) with advanced s.c. TC-1-Luc tumours immunized i.v. with 
E6/E7-LPX or irrelevant (OVA)-LPX. e, Survival of BALB/c mice (n= 10) 
with i-v. CT26-Luc colon carcinoma tumours immunized i.v. with 


tumours and protected C57BL/6 mice against tumour re-growth (Fig. 
3d, Extended Data Fig. 4e). All mice starting treatment on day 7 and 
90% of mice that started on day 10 survived. We vaccinated BALB/c 
mice bearing established rapidly growing CT26 lung tumours with the 
MHC class II neoepitope CT26-M90, derived from the P154S mutation 
in the Aldh18a1 gene identified by exome sequencing”’. This similarly 
led to efficient eradication of the lung tumours, highly significant long- 
term survival and protection from re-challenge, indicative of memory 
T-cell formation (Fig. 3e, Extended Data Fig. 4f). 

To investigate the effect of RNA-LPX-induced IFNa on T cells, 
BALB/c mice were repeatedly immunized with gp70-LPX, but pre- 
treated either with an IFNAR1-blocking antibody or an isotype con- 
trol before each immunization. Blocking IFNAR1 did not significantly 
affect expansion of gp70-specific CD8* T cells in the blood and spleen 
(Fig. 3f). However, antigen-specific T cells primed under IFNAR1- 
blocking conditions failed to execute processes of pivotal effector 
function, such as secretion of granzyme B, IFN and tumour necro- 
sis factor alpha (TNFa), and mobilization of degranulation marker 
CD107a/b (Fig. 3g). In another experiment, BALB/c mice with metas- 
tases derived from CT26 cells were immunized repeatedly with gp70- 
LPX, each preceded by injection of anti-IFNAR1 antibody or isotype 
control. Under IFNAR1-blocking conditions, lung metastases were only 
partially reduced and substantial residual tumour burden remained, 
whereas the lung tumours in control mice were completely rejected 
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CT26-M90-LPX or irrelevant (OVA)-LPX. f, g, De novo priming 

in BALB/c mice (n = 3) immunized i.v. with gp70-LPX (day 0, 3, 8) 

and injected i.p. with anti-IFNARI antibody or isotype before each 
immunization. f, Fraction of gp70-specific CD8* T cells within CD8*t 

T cells. g, Splenic CD8* T cells upon in vitro restimulation with no (none), 
irrelevant (HA) or gp70 peptide. h, CT26 colon carcinoma load in lungs of 
BALB/c mice (n = 5) immunized i.v. with gp70-LPX and injected i.p. 

with anti-IFNARI antibody or isotype. Significance was determined 

using log-rank test (a, d, e), two-way ANOVA and Dunnett’s multiple 
comparisons test (b), and one-way ANOVA and Tukey’s multiple 
comparisons test (g, h). Error bars, median with interquartile range (b), 
otherwise mean + s.d. 


(Fig. 3h). In conclusion, the ability of RNA-LPX to induce an IFNa 
response in lymphoid tissues appears to be critical for antigen-specific 
CD8°* T cells to acquire effector functions and execute potent in vivo 
anti-tumour activity. 

These findings together with favourable outcomes of safety phar- 
macology studies with clinical grade material in mice and cynomolgus 
monkeys (Extended Data Table 1) encouraged the clinical translation 
of this concept. 

A phase I dose-escalation trial with good manufacturing practise 
(GMP)-produced RNA-LPX vaccines encoding four tumour antigens 
(NY-ESO-1, MAGE-A3, tyrosinase and TPTE) for i.v. administration 
is currently recruiting patients with advanced malignant melanoma 
(NCT02410733). The first three patients were treated with a very low 
initial dose of RNA-LPX, followed by four weekly applications with 
moderately higher doses, but still below the absolute therapeutic dose 
levels used in mice (Extended Data Fig. 5a). All vaccine applications 
were well-tolerated with transient flu-like symptoms. All three patients 
had dose-dependent early release of IFNa and IP-10 (also known as 
CXCL10) peaking at 6h (Fig. 4a), resembling the kinetics observed in 
mice (Fig. 2a). All patients developed de novo T-cell responses against 
the vaccine antigens. In patient 1, who had no T cells against NY-ESO-1 
at baseline, cell counts of de novo induced NY-ESO-1-specific T cells 
four weeks after the last immunization reached the same range as 
those against immune-dominant HLA-class-I-restricted peptides from 
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Figure 4 | Clinically administered RNA-LPX vaccines dose-dependently 


induce systemic INFa and de novo priming and amplification of 

T cells against vaccine antigens. a, Serum cytokines before (0h) and 

after injection of intra-patiently escalated doses. b, c, T-cell responses 
against NY-ESO-1 and tyrosinase determined by restimulation with 
overlapping peptide mixtures or NY-ESO-1 epitopes (indicated with the 
amino acid position) in IFN ELISPOT and NY-ESO-1 specific MHC class 
I dextramer staining for patients 1 (b) and 3 (c). CEF, cytomegalovirus, 
Epstein-Barr and influenza viruses peptide pool; NM, not measured; 
PepMix, peptide mixture; pre-vac., pre-vaccination. d, Mechanism of 
action for RNA-LPX. Error bars, mean + s.e.m. 


cytomegalovirus, Epstein-Barr and influenza viruses (Fig. 4b, Extended 
Data Fig. 5b). HLA-B35 NY-ESO-1 dextramer analysis of blood samples 
showed a rapid induction of antigen-specific CD8* T cells within two 
weeks of starting treatment, increasing with subsequent vaccinations. 
Moreover, the pre-existing T-cell response against tyrosinase in this 
patient was augmented. Imaging before and after vaccination showed 
regression of a suspected metastatic thoracic lymph node lesion (data 
not shown). Patient 2, whose metastases were surgically removed before 
vaccination, experienced induction of CD4* T-cell responses against 
NY-ESO-1 and MAGE-A3 (Extended Data Fig. 5c) and was tumour- 
free at the time of this report (seven months after the start of vaccina- 
tion). In patient 3, a strong NY-ESO-1-specific HLA-Cw03-directed 
de novo T-cell response (Fig. 4c) and a weaker response against 
MAGE-A3 (Extended Data Fig. 5d) were induced. This patient pre- 
viously received various treatments and had eight lung metastases at 
recruitment, which remained clinically and radiologically stable (data 
not shown). 

In summary, our study provides insights into a novel class of sys- 
temically administered nanoparticulate RNA vaccines, which act by 
body-wide delivery of encoded antigens to APCs in the spleen, lymph 
nodes and bone marrow, and concomitant initiation of a strong type-I- 
IFN-driven immunostimulatory program (Fig. 4d). Systemic antigen 
targeting in lymphoid DCs is more potent than local vaccine deliv- 
ery (Extended Data Fig. 6) and has large therapeutic potential. To our 
best knowledge, RNA-LPX vaccines are the first example of a clinically 
applied systemic nanoparticulate vaccine which accomplishes this aim. 
Whereas the current model involves functionalization of nanoparticles 
with molecular ligands that target DCs®*, we show for the first time that 
precise DC targeting in lymphoid compartments can be accomplished 
using well-known lipid carriers such as DOTMA, DOTAP, DOPE and 
cholesterol, without functionalization, solely by adjusting negative net 
charge of the nanoparticles. 
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TLR7-driven type-I interferon release, the cellular source of which 
was unravelled in our study, appears to be essential for the full anti- 
tumour potency of this vaccine class, in contrast to previous reports 
that interferon counteracts locally applied RNA lipoplex vaccine 
responses'”. Type I IFN as a key molecule in antigen-specific immunity 
against viral infections has further effects, for example, upregulation of 
MHC expression, promotion of maturation and cross-presentation in 
DCs”, and direct inhibition of regulatory T-cell functions*>™4, which 
were not investigated in our study but may contribute to the antitumour 
efficacy of RNA-LPX. Our findings connect effective cancer immu- 
notherapy with host pathogen-defence mechanisms. Mechanisms of 
antiviral host defence are important for survival, conserved in all verte- 
brates and evolutionarily optimized for high sensitivity and potency. 
RNA-LPX vaccines appear to mimic infectious non-self and thus mobi- 
lize concomitantly adaptive and innate antiviral mechanisms”*”°. The 
iv. delivery of RNA-LPX simulates a viraemic pathogen intrusion in 
the blood stream, and by reaching DCs in various lymphoid tissues, 
mobilizes the full T-cell repertoire for adaptive immune responses. 

Our study shows profound expansion of effector T cells, even against 
self-antigens, and antitumour efficacy in various aggressively growing 
mouse tumour models induced by RNA-LPX. The dose-dependent 
IFNa response and stimulation of strong immune responses against 
self-antigens observed in the first cohort of patients supports the pre- 
clinically identified mode of action and strong potency of this approach 
in the clinical setting. RNA-LPX vaccines are fast and inexpensive to 
produce, and virtually any tumour antigen can be encoded by RNA. 
Thus, the nanoparticulate RNA immunotherapy approach introduced 
here may be regarded as a universally applicable novel vaccine class for 
cancer immunotherapy. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Mice. C57BL/6 mice, as well as Tir3~/~ and BDCA2-DTR mice?’ bred on a 
C57BL/6 background, and BALB/c mice were purchased from the Jackson 
Laboratory and Charles River. C57BL/6BrdCrHsd-Tyr‘ (B6 albino) mice were pur- 
chased from Envigo EMS. C57BL/6 Ifnar1~'~ mice’, derived from 129Sv Ifnar1~'~ 
mice”? via backcrossing were a gift from J. Kirberg (Paul-Ehrlich-Institute). The 
following mice strains were provided from colleagues from the University Medical 
Center of the Johannes Gutenberg University Mainz: Rag2-/~ TCR-HA mice 
transgenic for the influenza virus hemagglutinin A HAjo7-119 peptide-specific, 
I-E*-restricted T cell receptor (TCR) (HA-TCRtg); BALB/c mice transgenic for 
the HAsig-s26 peptide-specific, H2-K‘-restricted TCR (CL4-TCRtg); and BALB/c 
Thyl.1* mice from U. Hartwig, Tlr7~/~ mice on a C57BL/6 background from 
H. J. Schild, CD11c-DTR® and Tir9-/~ mice, both on a C57BL/6 background, by 
E. von Stebut-Borschitz, Tlr4~/~ mice by K. Steinbrink. For prolonged ablation of 
CD11c* cells, bone-marrow chimaeras were generated by reconstitution of lethally 
irradiated (9.5 Gy) C57BL/6 mice with bone marrow cells from CD11c-DTR mice. 
Age-matched (6-12 weeks) female animals were used throughout all experiments. 
Experimental group sizes were approved by the regulatory authorities for ani- 
mal welfare after being defined to balance statistical power, feasibility and ethical 
aspects. All mice were kept in accordance with federal and state policies on animal 
research at the University of Mainz and BioNTech AG. 

Tumour cell lines. B16-OVA is a murine B16F10 melanoma cell line expressing 
the chicken ovalbumin gene (OVA) containing the H2-K?-restricted OVA257-264 
epitope (SIINFEKL), which was a gift from U. Hartwig. CT26 (ATCC) is a murine 
colorectal cancer cell line endogenously expressing gp70 which is silent in most 
normal mouse tissues*!. Luc-expressing CT26 (CT26-Luc) and B16F10 tumour 
cells (B16F10-Luc) were generated by lentiviral transduction. The Luc-expressing 
TC-1 tumour cell line (TC-1-Luc) derived from primary lung cells by immor- 
talization and retroviral transduction with HPV16 E6 and E7” as well as Luc 
was obtained from E. Tartour (INSERM U970 PARCC) with the permission of 
T.-C. Wu (Johns Hopkins University). Master and working cell banks were gen- 
erated immediately upon receipt, of which third and fourth passages were used 
for tumour experiments. Cells were tested for mycoplasma every three months. 
Reauthentication of cells was not performed after receipt. 

RNA constructs and in vitro transcription. Plasmid templates for in vitro 
transcription of naked antigen-encoding RNAs were based on pSTI-A120 and 
pST1-MITD vectors**. pST1-MITD features a signal sequence for routeing to the 
endoplasmic reticulum and the major histocompatibility complex (MHC) class 
I transmembrane and cytoplasmic domains for improved presentation of MHC 
class I and II epitopes. pST1-eGFP-A120 (eGFP), pST1-OVA-MITD (OVA), pST1- 
Influenza-HA-MITD (HA) and pST1-Luciferase-A120 (Luc) vectors were 
described previously****, The OVA construct encodes the H-2K°-restricted, immu- 
nodominant epitope OVA957-264, and the HA construct contains a codon-optimized 
partial sequence of influenza HA (aa 60-285 fused to aa 517-527; influenza strain 
A/PR/8/34) designed to combine all major immunodominant MHC epitopes. 
pST1-Thy1.1 encodes the murine Thy1.1 protein. pST1-gp70-MITD encodes the 
H-2L“-restricted peptide antigen AH1 493.43; derived from Moloney murine leukae- 
mia virus envelope glycoprotein 70 (gp70) with an amino acid substitution at posi- 
tion five (V/A; AH1-A5)°”. pST1-E6-MITD and pST1-E7-MITD encode human 
papillomavirus (HPV) 16 full-length E6 and E7°%, respectively, and the sequence 
encoding the point-mutated 27-meric peptide CT26-M90 of ALDH18a1 was cloned 
into the pST1-MITD vector. pST1-TRP-1-MITD encodes murine tyrosinase- 
related protein 1 (TRP-1). pST1-empty-MITD does not encode a protein and was 
used as an irrelevant RNA control. RNA was generated by in vitro transcription as 
described previously’. For some experiments, Luc and Thyl.1 RNA was synthe- 
sized using 1-methyl-pseudouridine instead of uridine. Labelling of RNA with Cy3 
or Cy5 was performed according to the manufacturer's instructions (Amersham 
Biosciences), and of total uridine triphosphate (UTP), 15% were replaced with 
labelled UTP during in vitro transcription of the HA construct. 

Liposomes. Liposomes with positive (cationic) net charge were used to complex 
RNA for the formation of RNA-LPX and comprised of the cationic lipid DOTMA 
(Merck & Cie) or DOTAP (Merck Eprova), and the helper lipid DOPE (Avanti 
Polar Lipids or Corden Pharma) or cholesterol (Sigma-Aldrich). Liposomes were 
produced either by protocols based on the thin film hydration method*™! or by 
an adopted proprietary protocol based on the ethanol injection technique’. For 
the film method, stock solutions of the individual lipids were prepared in 99.5% 
ethanol at a concentration of approximately 10 mg ml (exact concentrations con- 
trolled by HPLC), and appropriate amounts (volumes) of the stock solutions were 
mixed according to the intended lipid ratio. The solvent was evaporated and the 
obtained lipid film was dried for 1h using a rotatory evaporator. The dry film was 
hydrated with RNase-free water by gently shaking to obtain a raw colloid with a 
total lipid concentration of approximately 6mM which was left overnight at 4°C 
for equilibration. For size adjustment, the dispersion was then extruded 10 times 
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through polycarbonate membranes with 200 nm pore size using the LIPEX 10 ml 
extruder (Northern Lipids Inc.). The lipid concentration was measured by HPLC 
and adjusted by further addition of H2O to a fixed concentration of the cationic 
lipid. 

RNA-LPX preparation and immunization. Lipoplex formation was performed 
with proprietary protocols, derived from extensive internal formulation devel- 
opment activities. The general procedures were derived from protocols for 
siRNA- and DNA-LPX preparation described elsewhere’. A diversity of for- 
mulations complexed with the reporter firefly luciferase (Luc)-encoding RNA 
was assembled, with liposomes comprising different cationic and helper lipids 
to create various lipid:RNA ratios, which defined the charge ratio and over- 
all RNA-LPX net charge. The charge ratios were calculated from the number 
of positive charges represented by lipid-specific head groups (one positive 
charge per head group) and the number of negative charges represented by 
RNA nucleotides, that is, from the RNA phosphodiester groups (one negative 
charge per phosphodiester). For the calculation of the molar ratio between RNA 
and cationic lipid, a mean molar mass of 330 Da per nucleotide was assumed. 
RNA was provided as a HEPES-buffered solution at an RNA concentration of 
1mgml~!. RNA-LPX were formed by diluting the RNA with H2O and 1.5M 
NaCl solution followed by adding an appropriate amount of liposome disper- 
sion to reach the selected charge ratio at a final NaCl concentration of 150 mM. 
RNA-LPX size (triplicates, from each measured ten technical replicates) and 
zeta potential (triplicates) were measured by photon correlation spectros- 
copy (PCS; 380 ZLS submicron particle/zeta potential analyser, PSS Nicomp). 
Uncomplexed RNA and RNA integrity were determined after isolation of 
total RNA by capillary gel electrophoresis (Agilent 2100 Bioanalyzer, Agilent 
technologies) (2-7 replicates). For formulation screening studies, 20 jug RNA- 
LPX corresponding to 0.1 mg ml~! RNA per mouse were injected i-v. into the 
retrobulbar venous plexus. For stability experiments, prepared RNA-LPX were 
pre-incubated with 50% syngeneic mouse serum for 30 min at room tempera- 
ture or stored for 1, 2, 3 or 8 days at 4°C and another 24h at room temperature 
before injection. For immunological and tumour experiments, mice were immu- 
nized three times with 401g RNA LPX unless stated otherwise. The genera- 
tion of memory T cells was verified by the recall response 42 days after the last 
immunization. Control mice received NaCl or remained untreated. Arrows in 
vaccination schemes indicate immunization. 

Cryogenic transmission electron microscopy. Each sample was preserved in 
vitrified ice supported by holey carbon films on 400-mesh copper grids. Samples 
were prepared by applying a 3 11 drop of sample suspension to a cleaned grid, 
blotted away with a filter paper, and immediately proceeding with vitrification in 
liquid ethane. Grids were stored under liquid nitrogen until being transferred to 
the electron microscope for imaging. Electron microscopy was performed using 
a FEI Tecnai T12 electron microscope, operating at 120 keV equipped with a FEI 
Eagle 4k x 4k CCD camera. Vitreous ice grids were transferred into the electron 
microscope using a cryostage that maintains the grids at a temperature below 
—170°C. Images of each grid were acquired at multiple scales to assess the overall 
distribution of the specimen. After identifying potentially suitable target areas for 
imaging at lower magnifications, pairs of high magnification images were acquired 
at nominal magnifications of 110,000 (0.10 nm per pixel), 52,000 x (0.21 nm per 
pixel) and 21,000 (0.5 nm per pixel). The images were acquired at a nominal 
underfocus of —2 1m (110,000), —4j1m (52,000) and —41m (21,000x), 
and electron doses of ~10-24e A~?. Cryo-electron transmission microscopy 
measurements were performed at Nanoimaging Services, Inc. 

Synthetic peptides. Peptides derived from gp70 (H2-L*-restricted gp70423-431 
AH1-A5 SPSYAYHQEP), HA (H2-K®@-restricted HAs1-526 [YSTVASSL), OVA 
(H2-K?-restricted OVA 57-264 SINFEKL), TRP-1 (H2-D?-restricted TRP- 
1455-463 TAPDNLGYA; I-A?-restricted TRP-1 113-126 CRPGWRGAACNQKI), 
CT26-M90 (LHSGQNHLKEMAISVLEARACAAAGQS) and vesicular stomati- 
tis virus nucleoprotein (H2-K°-restricted VSV NPs2.59 RGYVYQGL), NY-ESO-I 
(HLA-Cw03 NY-ESO-Io1-192 YLAMPFATPMEA (patient 2, 3), NY-ESO-Ig6-104 
FATPMEAEL (patient 2, 3); HLA-A31 NY-ESO-Is3.62 ASGPGGGAPR; HLA- 
A02 NY-ESO-I157-165 SLLMWITQC (patient 3)), MAGE-A3 (HLA-A01 
MAGE-A3 168-176 EVDPIGHLY; HLA-Cw7 MAGE- A3213-299 EGDCAPEEK 
(patient 2); HLA-A02 MAGE-A3}12-120 KVAELVHEL; HLA-B44 MAGE-A3}¢67-176 
MEVDPIGHLY; HLA-A02 MAGE-A3o71-279 FLWGPRALYV (patient 3)) and 
overlapping 15-mer peptide mixes for NY-ESO-I and tyrosinase (patient 1) were 
obtained from Jerini Peptide Technologies. 

Tissue preparation. Peripheral blood was collected from the orbital sinus. Spleens 
and lymph nodes were stored in PBS (Life Technologies). Spleen single-cell sus- 
pensions were prepared in PBS by mashing tissue against the surface of a 70-\1m 
cell strainer (BD Falcon) using the plunger of a 3-ml syringe (BD Biosciences). 
Erythrocytes were removed by hypotonic lysis. Lymph nodes were digested with 
collagenase D (1 mg ml; Roche) and passed through cell strainers. Bone marrow 
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cells were flushed from femur and tibia bones, homogenized and filtered, before 
erythrocytes were removed by hypotonic lysis. 
Flow cytometry. Monoclonal antibodies for extracellular staining included CD4, 
CD8, CD11b, CD40, CD45RA, CD49b, CD69, CD86, Ly6C, NK1.1, Thy1.1, 
Thy1.2 (BD Pharmingen), GR-1 (BioLegend), F4/80 (Invitrogen), CD11c, 
PDCA-1 (Miltenyi Biotec), CD62L, CD127, CD317 and Siglec H (eBioscience). 
Intracellular cytokine staining was performed with antibodies against IFNy, 
TNFa and CD107a and CD107b (BD Pharmingen) and cross-reactive human 
granzyme B (Invitrogen) using the cytofix/cytoperm kit (BD Pharmingen), after 
stimulation of 2 x 10° splenocytes with 41g ml“! gp70 AHI1-AS or irrelevant HA 
peptide in the presence of 20;1.g ml brefeldin A (Sigma), 401g ml“! GolgiStop 
(BD Pharmingen) and CD107a and CD107b for 5h at 37°C. Intracellular IFNa 
(RMMA-1, R&D Systems) staining was performed using the fixation and perme- 
abilization kit (eBioscience) after incubation of 2 x 10° splenocytes in the presence 
of 201g ml"! brefeldin A (Sigma) for 5h at 37°C. Quantification of OVA-specific 
CD8* T cells with H-2K/OVAo57-264 tetramer (Beckman-Coulter) was previously 
described*, and CD8* T cells recognizing gp70 AH1-A1 were detected with 
H-2L4/AH 1493-431 (SPSY VYHQEF) tetramer kindly provided by the NIH tetramer 
core facility (Emory University Vaccine Center). Antigen-specific CD8* T cells 
were determined five days after the last immunization. Viability was determined 
using 7-AAD (Sigma) or fixable viability dye (eBioscience). For patient samples, 
cryopreserved PBMC or freshly isolated PBMC (day 57) were stained for 10 min at 
room temperature in the dark with MHC dextramers (Immudex) bound to peptide 
(HLA-B35 NY-ESO- 194-192 MPEATPMEA (patient 1), HLA-Cw03 NY-ESO-I»-100 
LAMPPFATPM (patient 3). Cell surface markers CD3, CD8, CD16, CD14, CD19 
(all BD Pharmingen) and CD4 (Biolegend) were subsequently stained along with 
DAPI (BD Biosciences) for 20 min at 4°C in the dark. Antigen-specific CD8* T 
cells were determined within the CD3*CD8*tCD4° lineage™ population. Flow 
cytometric data were acquired on a FACSCanto II or, for patient samples, an LSR 
Fortessa SORP flow cytometer (both BD Biosciences) and analysed with FlowJo 
7.6.5 software (Tree Star). For cell sorting, splenocytes were pre-enriched by simul- 
taneous magnetic depletion of T and B cells using MACS magnetic microbeads 
coated with CD3 or CD19 antibodies and MACS columns (Miltenyi Biotec). cDCs 
(F4/80~ CD11c"), pDCs (F4/80~ CD11c'" PDCA- 1) and macrophages (F4/80™) 
were then sorted on a FACSAria cell sorter (BD Biosciences) according to their 
surface marker expression. Purities of sorted populations: cDCs, 97.8%; pDCs, 
99.7%; macrophages, 98.9%. 
Immunofluorescence staining. Immuofluorescence was performed as previously 
described’. For colocalization studies with CD11c* cells, 8-\um sections of crycon- 
served spleens were stained. Sections were fixed in 4% paraformaldehyde (PFA) 
for 10 min at room temperature in the dark, blocked using PBS supplemented 
with 1% BSA, 5% mouse serum, 5% rat serum and 0.02% Nonident for 1h at 
room temperature in the dark. Fluorescence-labelled CD11c antibody (clone N418, 
BioLegend) was used to stain sections overnight at 4°C, followed by nuclear stain- 
ing with Hochst (Sigma). Uptake of Cy3-RNA-LPX by CD11c* cells was revealed 
by visualization of CD11c- and Cy3-double-positive cells. Immunofluorescence 
images were acquired using an epifluorescence microscope (ApoTome, Zeiss). 
For biodistribution studies of Cy5-labelled RNA-LPX in the spleen and liver, 
cryconserved organs were cut and 6-j1m sections were fixed in 4% PFA for 10s at 
room temperature in the dark, followed by nuclear staining with DAPI (Sigma). 
Immunofluorescence images were acquired using an epifluorescence microscope 
(Axio Scan.Z1, Zeiss). For in vitro uptake and colocalization studies with Cy3- 
labelled RNA-LPX, treated human monocyte-derived DCs were fixed in 4% PFA 
for 10 min at room temperature in the dark, blocked using PBS supplemented 
with 0.5% BSA, 0.01% saponin, 5% mouse serum and 5% goat serum for 1h at 
room temperature in the dark. Primary antibody staining with TLR7 (polyclonal, 
Novus) and EEA1 (polyclonal, Cell signaling) was followed by secondary anti- 
body (anti-rabbit IgG, Jackson ImmunoResearch) and Héchst (Sigma) staining. 
Immunofluorescence images were acquired using a confocal microscope (SP8 
Leica). For the quantification of RNA-Cy3-LPX uptake by DCs in absence or 
presence of inhibitors, images were acquired using an epifluorescence microscope 
(ApoTome, Zeiss). The area (as pixel square) of Cy3* particles in individual cells 
was quantified as selected threshold areas using Fiji Image] 1.49. 
Ex vivo luciferase assay. Single-cell suspensions were prepared from the bone 
marrow of femur and tibia bones from mice 6h after injection of 100,1g Luc-RNA- 
LPX and 5 x 10° cells were plated in 96-well Nunc white plates (Thermo Scientific). 
Cell suspensions were treated with the equal volume of Bright-Glo luciferin reagent 
(Promega), incubated for 3 min on a microplate shaker and bioluminescence was 
measured with an Infinite M200 plate reader (Tecan) with an integration time of 1s. 
Background luminescence measured in cells obtained from untreated mice were 
within the range of 15 +5 counts per second. 
In vitro uptake studies. For uptake and maturation studies with whole blood, 
30 ul fresh whole blood were coincubated with 2 x 10° freshly generated human 


monocyte-derived immature DCs pre-treated with 50,1g ml! poly :C for 40h at 
37°C or left untreated and transfected with 0.2 1g Luc-LPX. After incubation for 
20h at 37°C, luciferase assay was performed as described above. For colocaliza- 
tion and uptake inhibition studies, freshly generated human monocyte-derived 
immature DCs (2.5 x 10° or 5 x 10°) were plated on poly-1-lysine-coated 12-mm 
cover slips or chamber slides (Nunc) and incubated overnight at 37 °C. Cells were 
transfected with 0.8 or 1.25 |1g Cy3-labelled RNA-LPX for 10 min and washed 
thoroughly with medium to remove extracellular RNA-LPX. For colocalization 
studies, cells were co-transfected with RNA-LPX and 1 mgml! FITC-labelled 
dextran (70,000 kDa, ThermoFisher) and fixed directly after washing. For costain- 
ing with EEA1 and TLR7, cells were incubated for another 30 min before fixation. 
For inhibition studies, cells were treated with 10|1M cytochalasin D“* (Sigma) for 
3h or 10M Rottlerin*® (Sigma) for 1h (inhibitors present during transfection) 
before transfection, and fixed directly after washing. All conditions were performed 
in duplicates. 

Bioluminescence imaging. Uptake and translation of Luc-RNA were evaluated 
by in vivo bioluminescence imaging using the Xenogen IVIS Spectrum imaging 
system (Caliper Life Sciences). Unless stated otherwise, an aqueous solution of 
t-luciferin (250,11, 1.6 mg; BD Biosciences) was injected intraperitoneally 6h 
after iv. injection of 20}1g Luc-RNA-LPX (ex vivo lymph nodes and bone marrow 
imaging: 24h after i.v. injection of 100j1g). Emitted photons from live animals 
or extracted tissues were quantified 10 min later with an exposure time of 1 min. 
Regions of interest (ROI) were quantified as average radiance (photons s-' cm~? sr}, 
represented by colour bars) (IVIS Living Image 4.0). 

Ex vivo fluorescence measurements. Upon organ retrieval, individual tissues were 
homogenized in 500,11 PBS using Lysis Matrix D tubes and the MP Biomedicals 
FastPrep-24 5G Instrument. Lysed tissues were directly subjected to fluorescence 
measurements for Cy5 (excitation, 650 nm; emission, 680 nm) using a standard 
fluorescence reader (Tecan Reader, Software i-control). 

Enzyme-linked immunospot (ELISPOT) assay. As described previously”, 5 x 10° 
freshly isolated splenocytes were incubated in a microtiter plate (Merck Millipore) 
coated with anti-IFNy antibody (101g ml~', AN18, Mabtech) in the presence of 
2\gml7! peptide for 18h at 37°C, and cytokine secretion was detected with anti- 
IFN* antibody (1 jugml~! R4-6A2, Mabtech). For analysis of T-cell responses in 
peripheral blood, PBMC were isolated via density gradient centrifugation, pooled 
and restimulated with 2 \1g/ml peptide. From each biological replicate, three tech- 
nical triplicates were performed. For ex vivo ELISPOT assay from patient samples 
(patient 1 and 3), cryopreserved peripheral blood mononuclear cells (PBMC) or 
freshly isolated PBMC (day 57) were used. Cryopreserved PBMC were thawed, 
resuspended in CTS OpTimizer T Cell Expansion serum-free medium (Thermo 
Fisher Scientific) and left for 2-5h at 37°C before performing the assay. For 
in vitro stimulation (IVS) before ELISPOT (patient 2), cryopreserved PBMC were 
thawed in a 37°C water bath and immediately transferred into CTL-Wash serum- 
free wash buffer (Cellular Technology). CD4* and CD8* T cells were purified 
using CD4 or CD8 microbeads (Miltenyi Biotec) according to the manufac- 
turer’s instructions. Positive fractions were resuspended at 1 x 10° cellsml~? in 
DMEM (ThermoFisher Scientific) containing 10% AB-human serum (Thermo 
Fisher Scientific) and left overnight at 37°C before performing the assay. 
Negative fractions were resuspended in RPMI (Thermo Fisher Scientific) con- 
taining 5% human serum, 0.5% penicillin-streptomycin 1x MEM non-essential 
amino acids and 1 mM Sodium Pyruvate (all from Thermo Fisher Scientific) and 
left to rest overnight at room temperature before being electroporated with RNA 
encoding vaccine antigens (BioNTech). Electroporated APCs were left to rest for 
3h at 37°C, followed by irradiation at 15 Gy. CD4*/CD8* effectors and electro- 
porated and irradiated APCs were coincubated at an effector:target ratio of 2:1. 
One day after starting the IVS, fresh culture medium was added together with 
10 UmI! IL-2 (Proleukin 2, Novartis Pharma) and 5ng ml“! IL-15 (Peprotech). 
IL-2 was added once again at the same concentration 7 days after setting up the IVS 
cultures, and the cultures were incubated for another 4 days. During incubation, IVS 
cultures were checked microscopically and fresh medium was added if necessary. 
ELISPOT was performed after 11 days of stimulation (50,000 cells per well). 
On multiscreen filter plates (Merck Millipore) coated with antibodies specific 
for IFNy (1-D1K, Mabtech), 3 x 10° PBMC were stimulated with overlapping 
peptides covering the whole length of the vaccine antigens (PepMix, JPT Peptide 
Technologies) or a mixture of HLA-class-I-restricted peptides from cytomegalo- 
virus, Epstein-Barr and influenza viruses (CEF pool, JPT Peptide Technologies) 
for 16-20h at 37°C. Plates were sequentially incubated with biotin-conjugated 
secondary anti-IFN7 antibody (7-B6-1, Mabtech) and ExtrAvidin-Alkaline 
Phosphatase (Sigma-Aldrich) before cytokine secretion was detected by adding 
BCIP/NBT substrate (Sigma-Aldrich). For each patient, technical triplicates were 
performed. Plates were scanned and analysed using the ImmunoSpot Series S5 
Versa ELISPOT Analyzer (S5Versa-02-9038), ImmunoCapture software 6.3 and 
ImmunoSpot software 5.0.3 (all Cellular Technology Ltd). 
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Depletion and blocking experiments. For depletion of CD11ct cells, CD11c-DTR 
mice were treated i.p. with 4ng g' body weight (bone marrow chimaeras: 8 ng g 
body weight) diphtheria toxin diluted in 200,11 PBS 12h before RNA-LPX admin- 
istration (depletion efficiency of CD11ctDTRY cells: >97.2%). CD11ci" pDCs 
are hardly affected by depletion in CD11c-DTR mice“. For depletion of pDCs, 
BDCA2-DTR mice were treated ip. with 4.5ng g' body weight diphtheria toxin 
diluted in 20011 PBS (depletion efficiency of CD11b~CD11c™Siglec-H™Ly6Ch 
pDCs: >97.2%). Macrophages were depleted by administration of 50 mgkg | body 
weight of clodrolip”” (4.46mg ml! clodronate formulated with 16mM POPC and 
14mM cholesterol) in PBS ip. 12h before RNA-LPX administration (depletion 
efficiency of F4/80+CD11b*CD11ci™ macrophages: >96.3%). Depletion of 
CD11c* cells, pDCs and macrophages was specific, other cells were not affected. In 
some experiments, 1001g anti-IFNAR1 blocking antibody (MAR1-5A3, BioXCell) 
or IgG1 isotype control (MOPC-21, BioXCell) diluted in 20011 PBS were applied 
ip. 1h before RNA-LPX injection. 

Cellular uptake, splenocyte activation, in vivo cytokine secretion. Mice were 
injected iv. with 401g Cy3- or Cy5-labelled RNA-LPX or 801g eGFP RNA-LPX 
and spleens were recovered 1h (Cy3, Cy5) or 24h later (eGFP). For splenocyte 
activation and cytokine secretion, mice were injected i.v. with 40 jug HA RNA-LPX. 
Unless stated otherwise, splenocytes were prepared 24h after injection to measure 
median fluorescence intensity of CD40, CD86 and CD69 expression on immune 
cell subsets. Serum was collected and production of IFNa was determined from 
samples stored at —20°C (pan-IFNa ELISA kit, PBL assay science). For patient 
samples, serum was stored at —80°C and IFNa and IP-10 levels were determined 
using the pan-IFNa ELISA kit (PBL assay science) and ECLIA multiplex assay 
(Meso Scale Discovery), respectively. Patient samples were measured in duplicates. 
Quantification of RNA in blood was performed by quantitative RT-PCR using 
antigen sequence-specific primers (IMGM Laboratories). 

In vivo proliferation assay. Splenocytes (1 x 10’) from BALB/c Thy1.2* CL4- 
TCR-transgenic mice labelled with 1|1M carboxyfluorescein diacetate, succinim- 
idyl ester (CFSE, Invitrogen) were adoptively transferred into BALB/c Thy1.1* 
mice and immunized i.v. with 401g HA-RNA-LPX 18h after transfer. Controls 
received Thy1.2* CL4-TCR-transgenic CD8* T cells but were not immunized. 
Four days after immunization, peripheral blood, splenocytes and LN cells were 
stained for Thy1.2*CD8* T cells and proliferation analysed by flow cytometry. 
Tumour models. Protective immunity: BALB/c or C57BL/6 mice were immunized 
repetitively with 40 jig RNA-LPX. After the last immunization, 2 x 10° CT26 or 
B16-OVA tumour cells, respectively, were inoculated s.c. into the flanks of mice. 
Therapeutic immunity: 1 x 10° TC-1 tumour cells were inoculated s.c. and mice 
were immunized three times with 40 j1g RNA-LPX (for E6/E7, 201g each). Tumour 
sizes were measured unblinded with a caliper every three to four days for calculat- 
ing tumour volumes using the equation (a? x b)/2 (a, width; b, length). Animals 
were euthanized when exhibiting signs of impaired health or when the length 
of the tumour exceeded 15mm. Metastasis models: 2 x 10° CT26, CT26-Luc 
(immunization with CT26-M90 RNA-LPX: 5 x 10°), B16-OVA or 3 x 10° B16F10- 
Luc tumour cells were injected i.v. and immunizations were initiated on day four 
after tumour inoculation. In some experiments, 1 mg of anti-IFNAR1 blocking 
antibody (MAR1-5A3, BioXCell) or IgG1 isotype control (MOPC-21, BioXCell) 
diluted in 200 :l PBS were applied i.p. 6h before RNA-LPX injection. CT26-Luc 
and B16F10-Luc tumour growth kinetics were determined unblinded by biolumi- 
nescence in vivo imaging. Mice were randomized based on their average radiance 
values (ANOVA-P method, Daniel’s XL Toolbox V6.53). When exhibiting impaired 
breathing, mice were killed and tumour burden was quantified unblinded after 
intratracheal ink (85 ml HO, 15 ml ink, two drops of ammonia water) injection 
(this step was omitted for B16-OVA model) and fixation with Fekete’s solution (5 ml 
70% ethanol, 0.5 ml formalin, and 0.25 ml glacial acetic acid)**. After 2-6 h, tumour 
lesions were bleached whereas normal lung tissue remained stained. 

Clinical trial design. The study protocol was approved by the relevant author- 
ity and ethics committee. The study was conducted in accordance with all appli- 
cable laws, regulations and in agreement with the ICH-GCP guidelines and the 
Declaration of Helsinki (Fortaleza 2013). Written informed consent from all 
patients was obtained before enrolment. For patient treatment, tumour antigen 
encoding RNA-LPX were prepared from GMP-manufactured components 
(BioNTech) in a dedicated pharmacy under GMP. Patients were injected i.v. with 
weekly escalating doses of RNA-LPX encoding antigens NY-ESO-1”, tyrosinase”, 
MAGE-A3°! and TPTE™” (1.9, 3.6 or 7.2 pg RNA-LPX of each antigen; see 
vaccination schemes in Extended Data Fig. 5a). Blood samples were obtained 
for cytokine measurements on day 1, 8 and 15 (0 (pre-vaccination), 2, 6, 24h 
after each vaccination), for ELISPOT and MHC class I dextramer staining on 
day 0 (pre-vaccination), 15, 22, 29, 57. Blood samples for T-cell monitoring were 
obtained before vaccination on the respective vaccination day. 

Statistical analyses and data presentation. All results are expressed as mean +s.d., 
mean +s.e.m. or median with or without interquartile range as indicated. Biological 
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replicates were used in all experiments unless stated otherwise. Unpaired two-tailed 
Student's t-test was used for comparison of two groups. One-way analysis of vari- 
ance (ANOVA) was performed when more than two groups were compared, and 
when determined significant (P < 0.05), multiple comparisons were performed 
using Tukey’s post-hoc test. Two-way ANOVA was performed when both time 
and treatment were compared, and when significant (P< 0.05) multiple compar- 
isons were performed using Bonferroni post-hoc tests or Dunnett's post-hoc test 
(Fig. 3b). Survival benefit was determined with the log-rank test. All statistical 
analyses were performed with GraphPad PRISM 6.01. *P <0.05, **P<0.01, 
“P< 0,001. In all experiments, representative images, dot plots and histograms 
are shown. Values below detection limit are marked with <LDL (lower detection 
limit). No statistical methods were used to pre-determine sample size for animal 
or other experiments. 
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Extended Data Figure 1 | Physicochemical characteristics and 
biological activity of RNA-LPX constituted from different lipids at 
various charge ratios. a, Bioluminescence imaging of Luc expression 
in BALB/c mice 6h after i.v. injection of different transfection reagents 
and controls: PBS (n = 3), 60 ug Luc-RNA alone (n = 3), 25 ug Luc-RNA 
complexed with TransMessenger (Qiagen) (n= 3), 51g Luc-RNA 
complexed with Viromer RED (Lipocalyx) (n =3) . b, Cryo-TEM images 
of Luc-LPX constituted at a positive:negative ((++):(—)) charge ratio of 
1.3:2 with DOTMA/DOPE liposomes. Scale bar, 100 nm. ¢, Fraction 

of uncomplexed RNA in Luc-LPX preparations constituted at different 
charge ratios with DOTMA/DOPE liposomes determined by capillary 
gel electrophoresis (n = 2-7). d, Particle size, polydispersity index (left) 
and zeta potential (right) (n =3) of RNA-LPX constituted with 
Luc-RNA and differently constituted liposomes at various charge ratios. 
e, Bioluminescence imaging of BALB/c mice (n = 3) after i.v. injection 


1.2x107 


of Luc-LPX constituted with different liposomes at various charge ratios 
corresponding to d. Pie charts show relative contribution of each organ 

to total signal. f, Relative biodistribution of Luc expression in explanted 
organs of BALB/c mice (n= 3) after i.v. injection of Luc-LPX constituted 
with DOTMA/DOPE liposomes at a charge ratio of (+-):(-) of 1.3:2 or 
Luc-RNA alone. g, Luc expression in human immature DCs transfected with 
5 wg Luc-LPX constituted freshly or stored after constitution for indicated 
time periods at 4°C (left) or room temperature (right). RNA-LPX tested in 
duplicates (stored) or quadruplets (fresh). Each bar represents triplicates. 
h, Particle size (upper left) and percentage of RNA integrity (upper right) 
of Luc-LPX (n= 1) incubated in 50% mouse serum for indicated time 
periods at 37°C. Bioluminescence imaging of Luc expression in BALB/c 
mice (n= 5) after iv. injection of Luc-LPX preincubated in 50% mouse 
serum for 30 min at 37 °C (lower left and right). NM, not measured. Error 
bars, median with interquartile range (h), otherwise mean + s.d. 
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Extended Data Figure 2 | Biodistribution and cellular uptake 
mechanism of RNA-LPX vaccines. a, Uptake of Cy5-labelled RNA in 
splenic cell subsets of C57BL/6 mice (n = 3) Lh after iv. injection of 40 jig 
Cy5-labelled RNA-LPX. b, Localization of CD11c and Cy3 double-positive 
cells in the spleen of BALB/c mice (n =2) 1h after iv. injection of 

40 jug Cy3-labelled RNA-LPX. Nuclear staining in blue. Scale bar,100 jm. 
c, Half-life of RNA-LPX in circulation analysed by quantitative RT-qPCR 
in male and female C57BL/6 mice (n =5 per time-point) after injection of 
601g RNA-LPX constituted with NY-ESO-I, tyrosinase, MAGE-A3 and 
TPTE RNA (15 1g each). d, Localization of Cy5* (upper left) or Thy1.1* 
cells (lower left) in spleen and liver of BALB/c mice (n =5) determined 
by microscopy or flow cytometry 1h or 20h after iv. injection of 40 pg 
Cy5-labelled RNA-LPX or 401g 1-methyl-pseudouridine-modified 
Thy1.1-LPX, respectively. Nuclear staining in blue. Scale bar, 50 j1m (top), 
20m (bottom). Biodistribution of Cy5 signal in homogenized organs of 
BALB/c mice (n= 2) (right). Note the signal in the liver is overestimated 
in this analysis owing to the strong signal in the gall bladder, probably 
reflecting biliary secreted free dye. e, Bioluminescence imaging of lymph 
nodes of BALB/c mice (n = 3) 18h after i.v. injection of 40 1g 1-methyl- 
pseudouridine-modified Luc-LPX. ax, axillary; ing, inguinal; mand, 
mandibular. f, Flow cytometry analysis of Cy5 and Thy1.1 expression in 
CD11c* cells in the bone marrow of C57BL/6 mice (n= 3) lh or 20h 
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after i.v. injection of 40 jg Cy5-labelled RNA-LPX or 401g 1-methyl- 
pseudouridine-modified Thy1.1-LPX, respectively. g, h, Localization 

of Cy3-labelled RNA in human immature DCs after co-transfection of 
1.25 ug Cy3-labelled RNA-LPX at a charge ratio of (+-):(—) of 1.3:2 and 
3:1 with dextran (g) or of 1.3:2 after staining for TLR7 or EEA1 

(h). Nuclear staining in blue. Scale bar, 10 um. i, Visualization and 
quantification of inhibited uptake of positively as well as negatively 
charged Cy3-labelled RNA-LPX in human immature DCs pretreated 
with rottlerin or cytochalasin D. Scale bar, 101m. j, Bioluminescence 
imaging of lymph nodes of BALB/c mice (n = 3) injected intranodally 
with 101M rottlerin in 10,1 PBS 15 min before i.v. injection of 80 ng 
Luc-LPX. k, Luminescence assay of whole blood enriched or not enriched 
with human immature DCs pretreated with poly I:C or not (control) 
before transfection with Luc-LPX at a charge ratio of 1.3:2. WB, whole 
blood. I, Poly-I:C-induced maturation determined by CD86 expression 
(left), bioluminescence imaging (middle) and eGFP expression in splenic 
cDC subsets (right) upon injection of BALB/c mice (n = 3) with 501g 
poly I:Ci.p. 12h before i.v. injection of 201g Luc-LPX or 801g eGFP-LPX, 
respectively. Significance was determined using unpaired two-tailed 
Student's t-test (d, lower left, f, 1, middle) and one-way ANOVA and 
Tukey’s multiple comparisons test (i-k, |, right). Error bars, mean + s.e.m. 
(k) or mean +s.d. otherwise. 
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Extended Data Figure 3 | Systemic TLR7- and IFNAR-dependent 
activation of APCs and effector cells, IFNo production and strong 
expansion of fully functional antigen-specific T cells induced by 
RNA-LPX vaccines. a, Localization of splenic CD11c"% cells at baseline 
(top) and 6h after i.v. injection of 40 \1g HA-LPX (bottom) into BALB/c 
mice (nm = 2). Nuclear staining in blue. Scale bar, 100 1m. RP, red pulp; 
WP, white pulp. b-e, Activation marker expression in splenic cell subsets 
and kinetics of IFNa serum levels after i.v. injection of mice (n =3 per 
time point) with HA-LPX in Tir3~/~, Tlr4-/~ and TIr9~'~ mice (b), 

in Ifnarl ~'~ mice (c, d), or in BALB/c mice treated with 100 jug anti- 
IFNARI antibody or isotype i.p. 1h before iv. injection of HA-LPX 

(e). Ab, antibody. f, mRNA levels of IFNa isoforms in sorted splenic 
APC subsets of C57BL/6 mice (n= 3) Lh after iv. injection of HA-LPX 
determined by qRT-PCR. Data expressed as log-fold change, as compared 
to control animals. g, IFNa serum levels after i.v. injection of HA-LPX in 
BDCA2-DTR mice ( =3 per time point) depleted (depl) of pDCs (left) 
and in C57BL/6 mice (n=3 per time point) depleted of macrophages 
(right). h, CFSE proliferation profile of HA-specific CD4* T cells in 
lymphoid compartments of BALB/c Thy1.1* mice (n = 3) after adoptive 
transfer of HA-specific Thy1.2* HA-TCR-transgenic CD4* T cells and 
subsequent immunization with HA-LPX or control (untreated). Fraction 
of proliferated cells indicated. tg, transgenic. i, Priming of naive HA- 
specific CD8* T cells ex vivo. BALB/c (n= 3) mice were immunized with 
801g HA-LPX, irrelevant (eGFP)-LPX or NaCl (control). Splenocytes 
were prepared 12h later and co-incubated with CFSE-labelled CL4-TCR- 
transgenic CD8* T cells isolated using MACS magnetic microbeads coated 
with CD8 antibodies at an effector:target ratio of 1:6. Four days later, 
proliferation profiles were analysed by flow cytometry. Numbers indicate 


~ 


the percentage of proliferated cells. j, Fraction of cytokine-secreting 
CD8* T cells within CD8* T cells in the spleen upon de novo priming 

in C57BL/6 mice (n = 5) immunized iv. (day 0, 3, 8) with OVA-LPX 

after in vitro restimulation with no (none), irrelevant VSV (irrelevant) 

or OVA peptide and intracellular cytokine staining (top). Spleen ex vivo 
ELISPOT assay upon de novo priming in BALB/c mice (n=5) immunized 
iv. (day 0, 3, 8) with gp70-LPX. Stimulation with no (none), irrelevant 
HA (irrelevant) or gp70 peptide (lower left). gp70-specific cytotoxicity 

in vivo (lower right). BALB/c mice (n = 5) were immunized iv. (day 0, 

3, 8) with 40 1g gp70-LPX. Naive splenocytes were labelled with 0.5 or 
51M CFSE and pulsed with peptide (61g ml ') five days after the last 
immunization, and target cells (2 x 10”) were adoptively transferred into 
immunized recipients iv. (irrelevant HA-loaded CFSE!":gp70-loaded 
CFSE} — 1:1). Recipient splenocytes were analysed by flow cytometry 
18h after transfer, and antigen-specific lysis was determined: specific lysis 
(%) = (1 — (percentage of cells pulsed with gp70/percentage of cells pulsed 
with HA)) x 100). k, Expression of memory markers CD127 and CD62L 
in gp70-specific, CD44*CD8* T cells compared to non-specific CD8* 

T cells in blood (day 19) and spleen (day 67) of BALB/c mice (n= 3) after 
priming with gp70-LPX (day 0, 7, 14). 1, Fraction of gp70-specific CD8* 

T cells within total CD8* T cells in blood, bone marrow and lymph nodes 
determined by MHC class I tetramer staining after de novo priming of 
splenectomized BALB/c mice (n = 5-7) immunized with gp70-LPX (day 0, 
7) or left untreated (control). Significance was determined using unpaired 
two-tailed Student's t-test (b left, c), two-way ANOVA and Bonferroni's 
multiple comparisons test (b right, g) and one-way ANOVA and Tukey’s 
multiple comparisons test (j, 1). Error bars, mean + s.d. 


© 2016 Macmillan Publishers Limited. All rights reserved 


40,000, cDC 8,000, NK 8,000: B 
Oo 4 44 ai E 30,000 = 6,000 = 6,000 
+t+—4IH- © 20,000 ® 4,000 4,000! + 
Bi6-OVAj | | Lung % Q 8. iy 
iv. OVA-LPX preparation © 10,000 © 2,000 © 2,000: 
Irrelevant oo id o 
rrelevant- + + 
Control = LPX — OVA-LPX _ 8,000; SD4°T —_ 8,000, DST =) 5 
” e im E 
= 4 § 6,000) % 6,000) = 15 
& g 4,000: g 4,000: = 10 2 
© 2,000: © 2,000 = 50 
ee os ee 
RO Sotgter eter gr 
Serve” SOM Seaes 
Casor SES SAP 
ee? ee ee 
<S s Ss 
e 2,000 Control 2,000 Irrelevant-LPX day 7 
4,500 4,500 
ene tae 0 13 20 27 
E 500 500 
° 604 | TC-1-LUC{ {| | 
§ 0 20 40 60 80 100120 0 20 40 60 80 100 120 IV. E6/E7-LPX 
Z tog,  ES/ET-LPX day 7 4oo, ESIE7-LPXday10 Day22  Day26 Day29 Day 38 
5 
g 80 300 4 
= 200 + 
40 
on 100 
0 o4 


0 20 40 60 80 100 120 


0 20 40 60 80 100 120 
Days after tumor inoculation 


Extended Data Figure 4 | Potent antitumour immunity and rejection of 
advanced aggressively growing tumours in mice conferred by RNA-LPX 


vaccines. a, B16-OVA melanoma load in lungs of C57BL/6 mice (n = 8) 


immunized i.v. (days 4, 7, 11) with OVA-LPX or irrelevant (eGFP)-LPX. 
b, Expression of activation markers measured 24h after i.v. injection of 
40 1g irrelevant (empty vector)-LPX, eGFP-LPX or OVA-LPX by flow 
cytometry in splenic immune cell subsets (n = 3) and IFNa serum 
levels (n = 3) 6h after injection in C57BL/6 mice. c, Bioluminescence 
signal of tumours in different groups before immunization and on 

day 25 (upper left), tumour load and lung weights (upper right) and 
TRP-1-specific CD8* and CD4* T-cell responses in spleens of control 
(untreated), irrelevant (empty vector)-LPX and TRP-1-LPX-immunized 
B6 albino mice (n = 12) on day 25 detected by ELISPOT assay (bottom), 
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depicted in Fig. 3b. d, Bioluminescence imaging of CT26-Luc carcinoma 
in BALB/c mice (n = 4-7) depicted in Fig. 3c (left). e, TC-1-Luc tumour 
growth in C57BL/6 mice (n= 10) (left), depicted in Fig. 3d, and remission 
of established advanced TC-1-Luc tumours in C57BL/6 mice (n= 10) 
immunized iv. with 40 jpg E6/E7-LPX (day 13, 20, 27) (right). f, Survival 
of BALB/c mice rechallenged with CT26-Luc colon carcinoma cells on 
day 109, depicted in Fig. 3e. Significance was determined using one-way 
ANOVA and Tukey’s multiple comparisons test (c), two-way ANOVA and 
Bonferroni’s multiple comparisons test (d), paired two-tailed Student's 
t-test (f, right), unpaired two-tailed Student's t-test (f, far right), and 
log-rank test (f, left). Error bars, median with interquartile range (d), 


mean +s.d. otherwise. 
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Extended Data Figure 5 | Clinical application of RNA-LPX vaccines 
and de novo priming and amplification of patient T-cell responses 
against encoded vaccine antigens. a, Vaccination scheme and monitoring 
for patients 1-3. b, Antigen-specific T-cell responses against NY-ESO-1 
and tyrosinase determined by restimulation with overlapping peptide 
mixtures in IFNy ELISPOT for patient 1. c, Antigen-specific T-cell 
responses against NY-ESO-1 and MAGE-A3, determined by post-IVS 


IFNy ELISPOT assay at indicated days for patient 2. Values are corrected 
for background (no peptide). d, Antigen-specific T-cell responses against 
NY-ESO-I and MAGE-A3, determined by ex vivo IFNy ELISPOT assay at 
indicated days for patient 3. Numbers in ELISPOT data indicate the amino 
acid position of each epitope. Significance was determined using unpaired 
two-tailed Student’s t-test. Error bars, mean + s.e.m. 
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Extended Data Figure 6 | Comparison of i.v. and s.c. routes for 
RNA-LPX administration in the context of T-cell priming and 
biodistribution of RNA-LPX upon s.c. administration. Fraction of OVA- 
specific CD8* T cells within CD8* T cells on day 13 in blood after de novo 
priming of C57BL/6 mice (n=5) immunized i.v. with OVA-LPX (day 0, 3, 8) 
(left). Biodistribution of Luc expression 24 h after s.c. injection of Luc- 
LPX in BALB/c mice (n = 3) (right). Signal can only be observed at the 
injection site and the draining lymph node. Significance was determined 
using one-way ANOVA and Tukey’s multiple comparisons test. Error bars, 


mean +s.d. 
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Extended Data Table 1 | Findings of non-GLP pilot pharmacokinetics and pharmacodynamics study in cynomolgus monkeys 


Category 

Local tolerance 

Mortality 

Clinical signs 

Body weight and body weight gain 
Food and drinking water consumption 
Electrocardiography 


Troponin-l 
Circulatory functions 
Hematology 

Clinical biochemistry 


Cytokine 


Complement 


Findings 

No test item-related reactions at daily inspections of the infusion sites. 

No mortality occurred during the course of the study. 

No signs of systemic toxicity were noted for any of the treated animals. 

No test item-related changes. 

No test item-related changes. 

No test item-related changes. In more detail, the quantitative evaluation of the ECG obtained on test days 15/16 did not 
reveal any test item-related influence on the heart rate, the RR interval, the QRS interval, the QT interval, the QTc 
values and the PQ interval for any of the animals treated in comparison to the control animals. 

No test item-related changes. 

Normal levels of peripheral arterial systolic, diastolic and mean blood pressure reported for all animals. 

No influence on haematological parameters was noted for the liposome-treated group. A transient decrease of 
lymphocytes and a transient increase of neutrophils was found as test item-related findings in a dose-dependent 
manner, but were back to normal levels within 48 h. 

No test item-related influence was rated on the biochemical parameters for the animals; one animal showed high LDH 
and CK values considered to be stress-related. 

IL-6 showed a dose-dependent and test item-related induction. C,,., levels were reached at 30 min after completion of 
the treatment, and were back to predose levels after 24 h. In 2 animals also IFNa was detected. 

No test item-related changes of complement factor 3a were noted. 


ECG, electrocardiography. LDH, Lactate dehydrogenase. CK, Creatine kinase. 
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Successful treatment of many patients with advanced cancer 
using antibodies against programmed cell death 1 (PD-1; also 
known as PDCD1) and its ligand (PD-L1; also known as CD274) 
has highlighted the critical importance of PD-1/PD-L1-mediated 
immune escape in cancer development!~°. However, the genetic 
basis for the immune escape has not been fully elucidated, with the 
exception of elevated PD-L1 expression by gene amplification and 
utilization of an ectopic promoter by translocation, as reported 
in Hodgkin and other B-cell lymphomas, as well as stomach 
adenocarcinoma®. Here we show a unique genetic mechanism of 
immune escape caused by structural variations (SVs) commonly 
disrupting the 3’ region of the PD-L1 gene. Widely affecting 
multiple common human cancer types, including adult T-cell 
leukaemia/lymphoma (27%), diffuse large B-cell lymphoma (8%), 
and stomach adenocarcinoma (2%), these SVs invariably lead to a 
marked elevation of aberrant PD-L1 transcripts that are stabilized 
by truncation of the 3’-untranslated region (UTR). Disruption of the 
Pd-I1 3'-UTR in mice enables immune evasion of EG7-OVA tumour 
cells with elevated Pd-l1 expression in vivo, which is effectively 
inhibited by Pd-1/Pd-11 blockade, supporting the role of relevant 
SVs in clonal selection through immune evasion. Our findings not 
only unmask a novel regulatory mechanism of PD-L1 expression, 
but also suggest that PD-L1 3’-UTR disruption could serve as a 
genetic marker to identify cancers that actively evade anti-tumour 
immunity through PD-L1 overexpression. 

Structural variations, including translocations, inversions, tandem 
duplications, and deletions, are widely observed across cancer 
genomes’. Of particular interest are those involving non-coding 
sequences recently reported for activation of several oncogenic 
drivers, including GFI1 (ref. 13) and TERT", which have been less 
intensively investigated to date owing to the high-level complexity 
of SVs in cancer genomes'!!*. To clarify novel oncogenic mecha- 
nisms through such SVs, we recently developed a robust platform 
for sensitive capture of a wide spectrum of SVs based on whole- 
genome sequencing (WGS) (detailed in Methods), which was initially 


applied to a set of WGS data from 49 cases of adult T-cell leukaemia/ 
lymphoma (ATL), a retrovirus-associated aggressive peripheral 
T-cell neoplasm!°. RNA sequencing (RNA-seq) data were also 
available for 43 samples (Extended Data Fig. 1a and Supplementary 
Table 1). 

Genome-wide mapping of SV-associated breakpoints revealed a 
number of recurrent breakpoint cluster regions. Among these, the most 
prominent corresponded to breakpoints at chromosome 9p24.1 found 
in 13 (26.5%) samples, which were narrowly clustered in a 3.1 kilobase 
(kb) region within the 3’ region of the PD-L1 locus (Extended Data 
Fig. 1b and Supplementary Table 2). Depending on samples, a variety 
of SV types were observed, including a large deletion (n= 1), tandem 
duplications (1 =4), inversions (n = 4), and translocations (n =4) 
(Fig. la and Extended Data Fig. 1c). However, irrespective of under- 
lying SV types, an aberrant PD-L1 allele was generated in all cases, 
where the authentic 3’ exons were replaced by an ectopic sequence 
derived from the rearranged loci (n= 12) ora short 327 base pair (bp) 
sequence within the last exon was inverted (ATLO17). It was appar- 
ent that these SVs were invariably associated with markedly elevated 
expression of PD-L1, except for a single case (ATL068) with very 
low tumour content (Fig. 1b). As expected from the underlying SV 
structure, all overexpressed PD-L1 transcripts underwent structural 
alterations, which, on the basis of RNA-seq, fused varying lengths 
of the 5’ region of the PD-L1 sequence to a short tract of intronic or 
intergenic sequence derived from exogenous loci containing a puta- 
tive polyadenylation (poly-A) signal’*® (n= 10), or caused a premature 
termination within the authentic 3’-UTR using an alternative poly-A 
signal (ATL033 and ATL050) (Fig. 1c and Extended Data Figs 1d and 
2a-d). In the remaining case (ATLO17), unexpected cleavage of PD- 
L1 transcripts occurred within the inverted 327 bp sequence of the 
3/-UTR using a newly created poly-A signal sequence (Extended Data 
Fig. 2e). None of these aberrant PD-L1 transcripts retained the intact 
3’-UTR and therefore, the scarcity of 3’-UTR reads in RNA-seq in the 
majority of SV(-++) samples indicated the predominance of the aberrant 
versus intact PD-L1 transcripts, which was reflected by high relative 
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involving PD-L1 3’-UTR. a, Different types of SVs commonly affecting 
3' region of PD-L1 are shown by indicated colours. b, PD-L1 exon 4 
expression (RPKM) in 43 ATL samples, coloured by PD-L1 SV status. 
Welch’s t-test. c, Genomic structure of the rearranged PD-L1 locus 

and transcription (orange dotted lines) in a case (ATL020) with 
3'-UTR-truncated PD-L1 transcripts, in which PD-L1 ORF is terminated 
before exon 7 and merged into an INSL6 intronic sequence. Breakpoints 
(blue dotted lines) are shown with accompanying copy number (CN) 
alterations. Amp, amplification. d, PD-L1 exon 4 expression (RPKM) and 
its relative value to that of 3’-UTR (exon 4 to 3’-UTR ratio) for 43 ATL 
cases. SV(+-) cases are indicated by colours corresponding to each SV type. 
e, Predicted structures of wild-type and two representative C-terminal- 
truncated PD-L1 fusion proteins. Sig, signal peptide; Ig, immunoglobulin; 
TM, transmembrane; Cyto, cytoplasmic. f, Summary of PD-L1 and PD-L2 
expressions in ATL cells with (n =7) or without PD-L1 SVs (n=9). 
Student’s t-test. g, h, Representative plots for PD-L1 surface expression (g) 
and PD-1 Ig binding (h) for ATL cells with (n =7) or without relevant SVs 
(n=9). i, IHC of PD-L1 SV(+) cases harbouring intact or truncated ORFs, 
compared with SV(—) cases using antibodies (Abs) specifically detecting 
N-terminal and C-terminal domains of PD-L1. Scattered stained cells are 
macrophages. See Extended Data Figs 1-4. 


expression of exon 3 or 4 over 3’-UTR (>3.5) in SV(+) samples 
(Fig. 1d and Extended Data Fig. 3). 

The entire PD-L1 open reading frame (ORF) was completely pre- 
served in six SV(++) cases, whereas in the remaining cases, the ORF 


2 | NATURE | VOL 000 | 00 MONTH 2016 


was interrupted by a rearrangement within intron 5 (n=4) or 6 (n=3), 
causing a premature truncation of the protein (Fig. le and Extended 
Data Fig. 4a). All of the predicted proteins from the aberrant tran- 
scripts retained the extracellular receptor-binding and transmem- 
brane domains of PD-L1 and were thought to be expressed on the 
cell surface with preserved receptor-binding capacity, which was con- 
firmed for representative PD-L1 variants derived from SV(+) samples 
(Extended Data Fig. 4b-d). Cell surface expression of different PD-L1 
proteins was also demonstrated in SV(+) primary ATL samples, which 
showed prominent overexpression of the proteins compared to SV(—) 
samples (Fig. 1fg). By contrast, expression of PD-L2 (also known as 
PDCD1LG2), another PD-1 ligand encoded in the 3’ vicinity of the 
PD-L1 locus, remained the same between SV(+) and SV(—) samples 
(Fig. 1f and Extended Data Fig. 1c), indicating that these SVs did not 
affect PD-L2 expression and also that increased cell surface binding 
of PD-1 in these SV(+) samples (Fig. 1h) could be explained by over- 
expressed PD-L1 proteins. Elevated expression of PD-L1 proteins in 
SV(+) samples was further evidenced by immunohistochemistry 
(IHC) and western blotting (Fig. li and Extended Data Fig. 4e). Note 
that in SV(+) samples with a truncated ORF, PD-L1 expression was 
detected only with an antibody directed against the N-terminal, but 
not the C-terminal, domain. 

The highly recurrent nature of SVs converging on PD-L1 3’-UTR 
and their common consequence of markedly elevated expression of 
aberrant but apparently functional PD-L1 proteins in ATL provide 
strong evidence that SV(+) ATL cells are clonally selected, most likely 
through escaping immune surveillance. Thus, we hypothesized that a 
similar mechanism of clonal selection and immune evasion might also 
operate in other, more prevalent human cancers. To investigate this, we 
next interrogated PD-L1 3'-UTR-involving SVs among 10,210 cancer 
samples from 33 tumour panels, for which RNA-seq data were available 
from the Cancer Genome Atlas (TCGA). Aberrant 3’-UTR-truncated 
PD-L1 transcripts were screened by detecting PD-L1-containing gene 
fusions and/or a high exon 4 to 3/-UTR ratio (>3.5), together with 
increased absolute PD-L1 exon 4 expression (reads per kb of transcript 
per million mapped reads (RPKM) > 2.0) (Extended Data Fig. 5 and 
Methods). We identified 31 cases expressing 3/-UTR-truncated PD-L1 
transcripts (Fig. 2a, b, Extended Data Figs 5, 6, and Supplementary 
Table 3). In the majority, underlying SVs implicated in the abnor- 
mal transcripts were uniquely determined or suggested on the basis 
of whole-genome/exome sequencing and/or SNP array-karyotyping 
data, which were largely similar to those found in ATL cases (Extended 
Data Fig. 7a). Abnormal transcripts were most frequently seen in dif- 
fuse large B-cell lymphoma (DLBC) (4 out of 48 cases) and stomach 
adenocarcinoma (STAD) (9 out of 415 cases) (Fig. 2b). Three of the 
nine STAD cases were positive for Epstein-Barr virus (EBV) expression. 
Expression of the aberrant PD-L1 transcripts was markedly elevated 
in all cases, of which 12 exhibited the highest expression levels within 
the corresponding 9 tumour types (Fig. 2b). PD-L1 copy number was 
reported to correlate with PD-L1 expression in Hodgkin lymphoma®’” 
and the correlation was also observed for DLBC, STAD, and ATL in 
the present study (Fig. 2c). However, irrespective of the copy number 
status of the PD-L1 gene, in the TCGA panel as well as in our ATL cases, 
PD-L1 3'-UTR disruption was significantly and independently associ- 
ated with elevated PD-L1 expression. JAK2 is another potential target 
of SVs at 9p24.1 (refs 7, 10), but its expression was not significantly 
affected by SVs, although it did correlate with elevated genomic copy 
number frequently accompanied by these SVs (Extended Data Fig. 7b). 

To investigate the anti-tumour response in SV(+) samples, we 
assessed the cytolytic activity score (a geometric mean of GZMA 
and PRF1 expression) within the TCGA cohort, which was shown to 
bea reliable marker of cytotoxic T-cell infiltration and anti-tumour 
immune activity'”. Probably reflecting adaptive response to anti- 
tumour immunity, the cytolytic activity significantly correlated with 
PD-L1 expression within each tumour type (P< 1 x 107’, genera- 
lized linear model (GLM)), where most of the PD-L1 SV(+) samples 
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Figure 2 | PD-L1 SVs associated with overexpression of aberrant 
PD-LI transcripts in multiple cancers. a, Exon 4 to 3’-UTR ratio versus 
PD-L1 expression (exon 4) for 10,210 TCGA samples from 33 tumour 
types. b, PD-L1 expression in each TCGA cancer type containing PD-L1 
SV cases. Each bar represents the 10th percentile. c, Effect of genomic 
copy number and SV on PD-L1 transcript level for 48 DLBL (left), 

415 STAD (middle), and 43 ATL (right) samples. P values for SV in GLM 
are provided. d, Cytolytic activity score (geometric mean of GZMA and 
PRF1 expressions) versus PD-L1 expression for each TCGA cancer type. 
Each black line represents a regression line with Pearson’s correlation 
coefficient (R) for SV(—) cases. SV(+) samples (red) and those with virus 
integration around the PD-L1 locus (orange) are indicated. BLCA, bladder 
urothelial carcinoma; COAD, colon adenocarcinoma; ESCA, oesophageal 
squamous cell carcinoma; KIRC, kidney renal clear cell carcinoma; 
LUAD, lung adenocarcinoma; READ, rectal adenocarcinoma; SKCM, skin 
cutaneous melanoma; UCEC, uterine corpus endometrioid carcinoma. 

e, Genomic structure of the rearranged PD-L1 locus and transcription 
(orange dotted lines) in a case of CESC showing HPV 16 integrations 
within the PD-L1 gene (VS-A9U7-01) (left). Structure of the newly 
generated fusion transcript and breakpoint sequences are also shown 
(right). UD, undetermined sequence. See Extended Data Figs 5-7. 


retained a high degree of cytolytic activity (Fig. 2d), supporting the 
hypothesis that PD-L1 SV(-+) cells are clonally selected in the pres- 
ence of anti-tumour immunity through constitutively upregulating 
PD-L1 expression. It is also interesting to note that compared to SV(—) 
samples with similar levels of PD-L1 expression, SV(+-) samples exhib- 
ited a significantly decreased cytolytic activity (P< 1 x 10~'°, GLM), 
suggesting that there was an attenuation of anti-tumour immune 
response in SV(+) samples. 
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Viral infection is a major cause of several human cancers. Intriguing 
in this regard were those TCGA cases in which viral integration was 
implicated in the aberrant PD-L1 transcription; in a case of cervical 
squamous cell carcinoma (CESC, VS-A9U7-01), human papillomavi- 
rus (HPV) 16 (ref. 18) was integrated into the PD-L1 locus, causing an 
amplification of the virally interrupted PD-L1 allele and transcription 
of a truncated PD-L1 mRNA extending into HPV E2 and E5 genes 
(Fig. 2e). Motivated by this finding, we searched for viral integrations 
into or around the PD-L1 locus in the TCGA panel (see Methods) and 
identified a similar defect caused by an HPV 16 integration in a further 
case (CV-5443-01) of head and neck squamous cell carcinoma (HNSC), 
which had previously been implicated in possible alternative PD-L1 
transcripts’’ but had escaped our initial screen relying on the exon 
4 to 3'-UTR expression ratio, despite remarkably high PD-L1 expression 
(Fig. 2b and Extended Data Fig. 7c). In the remaining case of STAD 
(FP-7998-01), a large 2.3 Mb segment containing an integrated EBV 
genome was amplified, where one of the breakpoints resided within 
the PD-L1 3’-UTR (Extended Data Fig. 7c). In cases of virally mediated 
cancers, viral integration is thought to have helped the infected cells 
escape not merely from anti-viral immunity during an early phase of 
infection, but also from later anti-cancer immunity. 

On the basis of these findings, we reasoned that loss of 3‘-UTR 
sequence could be the common mechanism of the markedly elevated 
PD-L1 expression associated with these SVs. To test this hypothesis, 
we introduced large deletions/inversions involving almost the entire 
sequence of the PD-L1 3’-UTR in a variety of human and mouse 
cell lines using the CRISPR (clustered regularly interspaced short 
palindromic repeats)-Cas9 system”°, and evaluated their effects on 
PD-L1 expression (Fig. 3a and Extended Data Fig. 8a). Only when 
cells were transfected with both, but not either, of the forward (F1 or 
F2) and reverse (R) single guide (sg) RNA vectors with Cas9, we were 
able to reproducibly obtain a small but discrete fraction of cells show- 
ing significantly elevated cell surface PD-L1 expression, compared 
to parental or mock (Cas9 alone)-transfected cells (Fig. 3b, c and 
Extended Data Fig. 8b). The cells were purified and we confirmed the 
presence of intended deletions or inversions (Fig. 3d, e and Extended 
Data Figs 8c-e and 9a-d) and significantly elevated expression of 
corresponding 3’-UTR-truncated PD-L1 transcripts and proteins 
(Fig. 3f-i and Extended Data Figs 8f, g and 9e). Further assessment of 
PD-L1 transcripts using actinomycin-D-induced inhibition of de novo 
transcription demonstrated a delayed clearance of 3/-UTR-truncated, 
compared to wild-type PD-L1 mRNA (Fig. 3j), suggesting a negative 
regulatory role of PD-L1 3'-UTR in mRNA stability. The effect of 
PD-L1 3/-UTR truncation on the upregulation of PD-L1 expression 
was shown to be much greater than that of interferon-y (IFN-74), 
a major inducer of PD-L1 expression’, suggesting a predominant reg- 
ulation of PD-L1 expression via the 3’-UTR sequence. Interestingly, 
however, when PD-L1 3'-UTR-disrupted cells were stimulated with 
IFN-1, PD-L1 expression was synergistically elevated (Extended Data 
Fig. 9f). Thus, 3’-UTR-disrupted cells can more effectively upregulate 
PD-L1 expression in the presence of IFN-7-secreting T cells to escape 
anti-tumour immunity. 

Finally, we evaluated the biological consequence of loss of PD-L1 
3'-UTR. When co-cultured with PD-1-expressing T cells (Jurkat), 
PC-9 cells with disrupted PD-L1 3’-UTR (and therefore with elevated 
PD-L1 expression) markedly enhanced apoptosis of Jurkat cells com- 
pared to mock-treated PC-9 cells with intact PD-L1 3’-UTR, which 
was blocked by anti-PD-L1 antibody (Fig. 4a). Next, to investigate 
the effect of disruption of Pd-l1 3’-UTR on anti-tumour immunity, 
we adopted a tumour regression model, in which syngeneic C57BL/6 
mice were subcutaneously inoculated with EG7-OVA cells with or 
without intact Pd-1] 3’-UTR, and anti-tumour (OVA) immunity was 
induced by immunostimulatory RNA polyinosinic-polycytidylic acid 
(poly(I:C))?! (Extended Data Fig. 10a). As previously demonstrated, 
in mice inoculated with EG7-OVA cells with intact Pd-l1 3’-UTR, 
peritumoral poly(I:C) treatment induced a marked tumour regression 
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Figure 3 | 3’-UTR disruption by CRISPR-Cas9 induces PD-L1 
overexpression. a, Positions of targeting sgRNAs and detection primers 
used for CRISPR-Cas9-mediated deletions and inversions of PD-L1 
3’-UTR. WT, wild-type. b, Frequency of PD-L1* cells in green 
fluorescent protein (GFP)* fraction by flow cytometry in HEK293T 

cells transfected with Cas9 and no, single, or pairwise sgRNAs (n = 3). 
*P < 0.05, **P< 0.005, Welch’s t-test. c, Sorting strategy of PD-L1*GFP* 
cells. d, e, Validation of PD-L1 3'-UTR deletions by PCR (d) and Sanger 
sequencing (e) in sorted HEK293T cells targeted by indicated sgRNAs. 

f, RNA-seq reads for HEK293T cells transfected with indicated vectors. 
g-i, Expression of PD-L1 transcripts (exon 4 RPKM) (g) and cell surface 
protein (h, i) in human (g, h) and mouse (i) cell lines transfected with 
indicated vectors. Representative of three independent experiments 

(h, i). j, Ratio of wild-type to truncated PD-L1 mRNA in steady state 
(left) and their expression levels relative to 18§ RNA (right) in ST-1 cells 
after transcriptional inhibition with 101g ml‘ actinomycin D. *P < 0.05, 
Student's t-test. Data represent mean + s.d. See Extended Data Figs 8, 9. 


with enhanced infiltration of CD8~ T cells into the tumour microen- 
vironment (Fig. 4b, c and Extended Data Fig. 10b, c). By contrast, 
almost no tumour regression was observed in mice inoculated with 
3'-UTR-disrupted cells with an attenuated CD8* T-cell reaction, 
which was in accordance with PD-L1 SV(+) human cancers (Fig. 2d), 
suggesting that Pd-11 3’-UTR-disrupted cells can escape anti-tumour 
immunity mediated by cytotoxic T lymphocytes through activation 
of Pd-1/Pd-11 signalling. In fact, blockade of the signalling with anti- 
Pd-l1 antibody restored CD8* cytotoxic T lymphocyte induction and 
tumour regression in the mice carrying Pd-11 3’-UTR-disrupted cells 
(Fig. 4d-f and Extended Data Fig. 10d). These results not only suggest 
that tumour-intrinsic PD-L1 overexpression due to 3’-UTR disruption 
promotes immune evasion and tumour cell growth, but also indicate 
that PD-L1 3’-UTR-involving SVs could be potentially actionable ther- 
apeutic targets. 
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Figure 4 | PD-L1 activation by 3’-UTR loss promotes tumour growth 
and immune escape. a, Frequency (mean + s.d.) of mock- or PD-1-transfected 
Jurkat T cells undergoing apoptosis after co-incubation with parental 

or sgPD-L1-transfected PC-9 cells in the presence of isotype control 

or anti-PD-L1 antibody (n = 3). b-f, Mock- or sgPd-ll-transfected 
EG7-OVA tumours injected with PBS (blue) or poly(I:C) (red) (b, ¢), 

or poly(I:C)-treated, sgPd-ll-transfected EG7-OVA tumours administered 
with isotype control (red) or anti-Pd-l1 antibody (green) (d-f) were 
analysed for kinetics of tumour volume (b, d, n = 6-8 per group; 

mean +s.e.m.), and number of tumour-infiltrating CD8* T cells per field 
by immunofluorescence staining (c, e, >20 random images from 

2-3 animals per group). Representative images (from experiments in e) 
with CD8 (green) and DAPI (2’,6’-diamidino-2-phenylindole, purple) 
staining shown (f). *P < 0.05, **P < 0.005, ***P < 0.0005, Student's t-test 
(a, b, d) and Brunner—Munzel test (c, e). See Extended Data Fig. 10. 


The unique SVs commonly targeting 9p24.1 in multiple cancers 
have unmasked a critical role of the 3‘- UTR sequence in the regula- 
tion of PD-L1 expression, which in turn provides new insights into 
how cancer cells exploit this regulatory mechanism to evade immune 
surveillance by disrupting the sequence. Although accounting only for 
a small fraction of patients in most cancer types, SVs affecting PD-L1 
are thought to represent common activating SVs in human cancers, 
like BCR-ABL and ALK fusions, affecting a substantial number of 
cancer patients. In some genes, typically those encoding transcription 
factors and cytokines, 3’-UTR is involved in post-transcriptional reg- 
ulation of mRNA decay rate, which is a major determinant of mRNA 
abundance, and deregulation of this has been implicated in human 
diseases, including cancer?””*. PD-L1 is among such genes, as it has 
in its long 3‘- UTR a number of cis-acting elements involved in mRNA 
decay, including an AU-rich element and potential microRNA-bind- 
ing sites, such as those for miR-34 and miR-200 (Fig. 3j)”**. It would 
be interesting to determine the underlying molecular mechanisms of 
the 3/-UTR-mediated regulation of PD-L1 expression and how these 
affect normal as well as abnormal immunity, and how cancer cells 
dispatch or deregulate these mechanisms to evade anti-cancer immunity, 
especially in PD-L1-involving SV(—) cancers, which account for 
the vast majority of cases in which PD-L1 is overexpressed. Another 
important implication from the present study is that disrupted 
PD-L1 3'-UTR might serve as a genetic marker for identifying cancers 
that actively evade immune surveillance and therefore, potentially 
respond to immune checkpoint blockade using antibodies against 
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PD-1/PD-LI1. A potential problem of detecting PD-L1 expression 
with a C-terminally directed antibody alone should also be high- 
lighted. The surprisingly high efficacy of anti-PD-1/PD-L1 therapy 
in Hodgkin lymphoma, in which PD-L1 overexpression is frequently 
associated with genetic defects in PD-L1 (ref. 6), suggests that the 
above implication could be relevant for patients with ATL and other 
advanced cancers, particularly those for which no effective therapy 
is currently available. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Patients and materials. A total of 49 ATL patients were enrolled in this study, of 
which 48 had been analysed by WGS previously'’. Diagnosis and sub-classification 
were based on the WHO classification and the International Consensus Meeting 
proposal’”*° (Supplementary Table 1). All samples were collected from patients 
with informed consent according to the protocols approved by the Institutional 
Review Boards. This study was approved by the institutional ethics committees 
of the Graduate School of Medicine, Kyoto University and other participating 
institutes. 

WGS analysis and SV detection. Methods of genomic DNA preparation and WGS 
were described previously!’. Detection of SVs was performed using our in-house 
pipeline, Genomon-SV (Y. Shiraishi et al., in preparation), which enables sensitive 
and accurate detection of a variety of SVs, relying on both breakpoint-containing 
junction reads and improperly aligned read pairs for maximizing sensitivity. 
To identify significant breakpoint cluster regions, we divided the entire genome into 
1-Mb windows, and enumerated the number of samples with at least one break- 
point within each window, as described in the literature'*!*. As ATL samples have 
frequent deletions in common fragile sites, which seem to be passenger events!, 
we first focused on the other SV types, that is, inversions, tandem duplications, and 
translocations to determine breakpoint cluster regions, and then all of the break- 
points were interrogated within the focused breakpoint cluster regions, regardless 
of SV types, that is, taking deletion-type SVs also into consideration. For those 
windows in which positive events were detected by the initial screen, breakpoints 
were visually inspected using Integrative Genomics Viewer (IGV). 

RNA-seq analysis. Methods of preparation of RNA samples and RNA-seq were 
previously described!®. Genome index generation and sequence alignment 
were performed using STAR software (version 2.4.0)*1, followed by sorting and 
indexing of BAM files using SAMtools (version 1.2)°*, where GRCh37 (human 
reference assembly) as well as NC_007605 (for EBV) and hs37d5 (from the 1000 
Genomes Project Phase II for decoy sequences) sequences were used as reference 
genomes. RNA-seq reads were also aligned to cancer-related viral sequences 
(including those for HPV and hepatitis B virus) available from the NCBI Viral 
Genomes Resource* using BLAT™. Positive identification of viral genomes 
required at least 1,000 read pairs properly mapped to viral sequences. For samples 
positive for a viral genome, realignment of the RNA-seq reads was performed 
after adding the corresponding viral sequence into the reference genomes. For 
transcriptome analysis of mouse samples, the GRCm38 mouse reference assem- 
bly was used as a reference. To detect fusion transcripts, we used our in-house 
program, fusionfusion, which enables effective selection of putative chimaeric 
transcripts generated by the STAR algorithm". All samples positive for candidate 
PD-L1-containing fusions together with those harbouring genomic breakpoints 
within the PD-L1 gene were manually reviewed using IGV. The transcription 
termination site was determined by detecting poly-A sequences and/or the 
genomic sequences at which the number of RNA-seq reads was abruptly 
reduced. Putative poly-A signal sequences (AAUAAA, UAUAAA, AUUAAA, or 
AGUAAA) were searched within 10-30 nucleotide upstream of the transcription 
termination site!®. A modified version of RPKM was used for expression quan- 
tification*>. For PD-L1 expression analysis, RPKM values for each exon and the 
distal part of 3’-UTR (chr9:5469203-5470567 for human and chr19:29386741- 
29388094 for mouse) were calculated. 

Validation of PD-L1 SVs and their products. The validation of PD-L1 SVs and 
associated fusion transcripts was performed using both genomic and reverse 
transcription PCR, followed by Sanger sequencing. NCBI reference sequences 
(NM_014143 for nucleotide and NP_054862 for amino acid) were used as a 
reference. 

Analysis of TCGA data sets. We analysed 10,210 TCGA samples from 33 cancer 
types, for which RNA-seq data were publically available, to interrogate whether 
3/-UTR-disrupted aberrant PD-L1 transcripts were found in cancer types other than 
ATL (Extended Data Fig. 5). Briefly, gene-level RNA-seq expression data 
(normalized RSEM (RNA-seq by expectation-maximization) value) were obtained 
from the standardized analysis-ready TCGA data, Broad GDAC Firehose std- 
data__2015_08_21 run (http://dx.doi.org/10.7908/C18W3CNQ) or otherwise, 
directly from the TCGA data portal (https://tcga-data.nci.nih.gov/tcga/) for DLBC 
and STAD samples. RNA-seq data for the corresponding samples were obtained 
from the Cancer Genomic Hub (https://cghub.ucsc.edu), and analysed through our 
in-house pipeline, as described above for sequencing alignment, PD-L1 expression 
quantification, as well as detection of fusion transcripts and viral sequences. A sam- 
ple was considered to have aberrant PD-L1 transcripts, when the sample showed 
elevated expression of PD-L1 exon 4 (RPKM > 2), together with a high PD-L1 exon 
4 versus 3'-UTR ratio in RPKM (>3.5) or PD-L1 fusion transcripts detected by 
fusionfusion. Candidate cases for aberrant PD-L1 transcripts were further assessed 
by visual inspection using IGV. Cancer-related viral integration within or near the 


PD-L1 gene was also investigated. Cytolytic activity was calculated as a geometric 
mean of GZMA and PRF1 expressions (as expressed in RPKM, 0.01 offset), as 
previously described’. The effect of PD-L1 SVs on cytolytic activity was assessed 
on the basis of a GLM using PD-L1 expression and cancer type as covariates. 
Copy number data. Copy number data for ATL were analysed to estimate total 
and allele-specific copy numbers using CNAG/AsCNAR*®” and ASCAT*® 
for Affymetrix GeneChip Human Mapping 250K NspI Array and Illumina 
Human610-Quad BeadChip, respectively. The level 3 segmented copy number 
data (Affymetrix Genome-Wide Human SNP Array 6.0) were downloaded from 
the TCGA data portal (https://tcga-data.nci.nih.gov) for (i) samples with abnormal 
PD-L1 transcripts and (ii) all DLBC and STAD samples. Copy numbers for PD-L1 
exon 4 were used for further analysis. 

IHC. IHC was performed on formalin-fixed paraffin-embedded tissue sec- 
tions using antibodies directed against the N-terminal (E1J2J, Cell Signaling 
Technology) and C-terminal (SP 142, Spring Bioscience) domains of PD-L1. The 
antigen-antibody complexes were visualized with Histofine Simple Stain MAX 
PO (Nichirei Bioscience). 

Cell lines. HEK293T (human embryonic kidney), PC-9 (human lung cancer), 
and Jurkat (human T-cell leukaemia) cell lines were obtained from RIKEN Cell 
Bank, and P815 (mouse mastocytoma) cell line was from JCRB. T2 (human T and 
B lymphoblast hybrid), EG7-OVA (mouse T-cell lymphoma), ST-1 (human ATL), 
and B16-F10 (mouse melanoma) cell lines were gifts from H. Kawamoto, T. Seya, 
Y. Yamada, and N. Minato, respectively. Cell lines were authenticated by the 
provider and routinely tested for mycoplasma infection. 

Gene transfer and retroviral transduction. Vector transfection was performed 
using X-tremeGENE 9 DNA Transfection Reagent (Roche) for HEK293T and 
PC-9, Lipofectamine 2000 Reagent (Thermo Fisher Scientific) for B16-F10, Neon 
transfection system (Thermo Fisher Scientific) for T2, EG7-OVA, and P815, and 
Amaxa Nucleofector (Lonza) for Jurkat, according to the corresponding manu- 
facturer’s protocol. PC-9 cells were transduced with retroviral supernatant from 
Phoenix-GALV packaging cells* (a gift from H. P. Kiem with permission from 
G. P. Nolan) transfected with the indicated vectors. 

CRISPR-Cas9-mediated genome editing. Human and mouse PD-L1 3/-UTR 
sgRNA targeted sites were designed manually and checked in silico. The 
pSpCas9(BB)-2A-GFP (pX458) vector expressing Cas9 (Addgene plasmid 48138) 
was digested with Bbs1 and ligated to annealed and phosphorylated sgRNA oligo- 
nucleotides. The sgRNA sequences are listed in Supplementary Table 4. Human 
and mouse cell lines were transfected with indicated pX458 vectors and collected 
48 h later. To validate CRISPR-mediated DNA cleavage occurring at the intended 
position, a genomic region containing the target sequence was amplified using 
KOD EX Neo DNA polymerase (TOYOBO) and gel purified. PCR primers used for 
validation are listed in Supplementary Table 5. Sequencing libraries were prepared 
from the PCR product using NEBNext Ultra DNA Library Prep Kit for Illumina 
(New England Biolabs) and sequenced on the Illumina Hiseq 2000/2500 platform 
as previously described. To establish cell lines with PD-L1 3’-UTR deletions/ 
inversions, cell lines were transfected with a pair of sgRNAs, collected 48 h later, 
and purified with FACSAria II Cell Sorter (BD Biosciences) using GFP and PD-L1 
expression as a surrogate marker for successful rearrangement. To validate deletions/ 
inversions induced by introduction of a pair of sgRNAs targeting both ends of 
PD-L13'-UTR, genomic PCR flanking the breakpoint region and Sanger sequencing 
was performed. This was followed by RNA-seq to assess PD-L1 expression and 
detect resultant truncation of PD-L1 transcripts. 

Plasmid constructs. GFP-tagged PD-1/PDCD1 (NM_005018) cDNA in pCMV6- 
AC-GFP vector was obtained from OriGene. The fragments of wild-type PD-L1 
(NM_014143) and its fusion cDNA were obtained by PCR amplification of cDNA 
extracted from ATL samples without (ATL046) or with (ATL020 and ATLO79) 
PD-L1 SVs, respectively, then cloned into LZRSpBMN-Z vector (a gift from 
G. P. Nolan) using In-Fusion HD cloning kit (TaKaRa). 

Western blot. Cells were lysed, subjected to SDS-PAGE, and transferred to a 
PVDF membrane (Millipore). The blot was incubated with the antibodies listed in 
Supplementary Table 6, and visualized by Immobilon Western Chemiluminescent 
HRP Substrate (Millipore). 

Flow cytometry analysis. The list of antibodies used for flow cytometry is pro- 
vided in Supplementary Table 7. Stained cells were analysed on FACS LSR Fortessa 
or FACSAria II Cell Sorter (BD Biosciences). The data analyses were performed 
with FlowJo software (TreeStar). To assess PD-1 binding capacity of PD-L1 fusion 
proteins, ATL primary samples or PC-9 cells transduced with indicated PD-L1 
were first incubated with recombinant human PD-1 Fc chimaera (R&D Systems) 
at lig ml~!. After 30 min, the cells were washed and further incubated for 30 min 
with APC anti-human IgG Fc (BioLegend) or APC mouse IgG2a, k-isotype con- 
trol (BioLegend). To analyse the effect of IFN--j on PD-L1 expression, PC-9 cells 
were stimulated with different concentrations of human recombinant IFN-y 
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(100 or 300 U ml}; Roche) for 48h, followed by analysis of cell surface expression 
of PD-L1 using flow cytometry. 

Assay for PD-L1 transcript stability. ST-1 cells harbouring wild-type and trun- 
cated PD-L1 were collected at the indicated time points after transcription was 
inhibited by adding actinomycin D (Nacalai Tesque) in cell culture at the concen- 
tration of 101g ml. Total RNA was extracted using RNeasy Mini Kit (QIAGEN), 
followed by cDNA synthesis with ReverTra Ace qPCR RT Kit (TOYOBO), and 
subjected to quantitative reverse transcription PCR with SYBR Premix Ex TaqII 
(Tli RNaseH Plus) (TaKaRa) and LightCycler 480 System (Roche) according to 
the manufacturer’s instructions. PCR primers used are listed in Supplementary 
Table 8. All assays were performed in three technical replicates for each biological 
replicate and relative expression was normalized for 18S rRNA. We also searched 
for potential regulatory elements in PD-L1 3’-UTR sequence using UTRdb”’. 

In vitro co-culture assay. Mock- or PD-1-transfected Jurkat T cells were 
co-incubated with parental or sgPD-L1-transfected PC-9 cells in the presence 
of anti-PD-L1 antibody (29E.2A3; BioLegend) or mouse IgG2b isotype control 
(MPC-11; BioLegend) for 18h, and then assayed for apoptosis. For detection 
of apoptotic cells, cells were stained with DAPI and Annexin V-FITC (BD 
Biosciences) according to the manufacturer's protocol. 

In vivo EG7-OVA tumour model. All animal experiments were approved by the 
Animal Research Committee, Graduate School of Medicine, Kyoto University and 
the Hokkaido University Animal Care and Use Committee and strictly adhered to 
their guidelines. Female C57BL/6 mice (6-10 weeks old) were obtained from CLEA 
Japan and maintained under pathogen-free conditions. Mice were subcutaneously 
transplanted with 2 x 10° mock- or sgPd-11-transfected EG7-OVA cells in PBS. On 
day 7, PBS or 501g poly(I:C) were subcutaneously injected around the tumour. On 
day 14 or 15, tumours were collected from tumour-bearing mice and further ana- 
lysed. Tumour size was measured at regular intervals with a caliper, and calculated 
using the following formula: tumour size (cm?) = (long diameter) x (short 
diameter)? x 0.52. To examine the effect of Pd-1/Pd-I1 blockade, we intraperito- 
neally injected recipient animals with 200 1g anti-mouse Pd-l1 antibody (10F.9G2; 
Bio X Cell) or rat IgG2b isotype control (LTF-2; Bio X Cell) on days 7, 9, 11, and 13. 
No statistical methods were used to predetermine sample size. No randomization 
or blinding was performed. No tumour exceeded the maximum size approved by 
the animal welfare committee and regulations. 

Analysis of EG7-OVA tumours. For immunofluorescence imaging, 4% para- 
formaldehyde-fixed frozen sections were stained with anti-CD8a—APC anti- 
body (53-6.7; BioLegend) and DAPI (Thermo Fisher Scientific), and examined 
using LSM510 META confocal microscope (Zeiss). The number of CD8* T cells 
infiltrating into tumours was counted for randomly selected fields (0.2 mm? per 
field). For flow cytometric analysis, tumours were finely minced and treated with 
collagenase I (Sigma-Aldrich), collagenase IV (Sigma-Aldrich), hyaluronidase 
(Sigma-Aldrich), and DNase I (Roche) in Hank's Balanced Salt Solution at 33 °C 
for 10 min, and stained with antibody. 
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Statistical methods. Statistical analyses were performed with R3.1.3 software 
(The R Foundation for Statistical Computing). Normality of data distribution 
and homogeneity of variance were assessed by the Shapiro-Wilk’s test and F-test, 
respectively. Student's two-tailed t-test was used to compare two groups and a 
Welch's correction was applied when comparing groups with unequal variance 
(F-test P< 0.05). Brunner-Munzel test was performed when normal distribution 
could not be assumed (Shapiro-Wilk’s test P< 0.05). GLM was used to assess 
the effect of PD-L1 SVs and DNA copy number on PD-L1 and JAK2 expression. 
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otherwise specified. 
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Extended Data Figure 1 | PD-L1 SVs affecting 3’-UTR in ATL. PD-L1 gene are shown by indicated colours. d, 3’-truncated PD-LI mRNA 
a, Diagram showing numbers of ATL samples investigated by WGS/RNA- transcripts observed in ATL cases. RNA-seq data are visualized by IGV for 
seq. b, Genome-wide distribution of SV (without deletion) breakpoints ATL samples with or without PD-L1 SVs. Aberrant PD-L1 transcripts are 
in 49 ATL samples, showing a prominent peak at the PD-L1 locus. shown in red. 
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Extended Data Figure 2 | Fusion genes involving PD-L1 and non- 
coding sequences identified in ATL. a, ATL020 fused sequence: the 
sequences derived from PD-L1 exons 1-6 and INSL6 intron 1 are marked 
in yellow and green, respectively. The non-template sequence is marked in 
blue, and stop codon (UAG), poly-A signal (AAUAAA) and poly-A 

in red. The putative translated region is underlined. b, Genomic structure 
of the rearranged PD-L1 locus and transcription in two representative 
cases (ATLO12 and ATLO79) with 3’-UTR-truncated PD-L1 transcripts, 

in which PD-L1 ORF is terminated before exon 6 or 7, and merged into 


Poly-A 


an intergenic sequence. Breakpoints (blue dotted lines) are shown with 
accompanying copy number alterations. c, Structure and breakpoint 
sequence of PD-L1 fusion transcripts (top) with Sanger sequencing 
chromatogram (bottom). d, Length of abnormal PD-L1 transcripts 
identified in ATL samples with PD-L1 SVs, compared with wild-type 
PD-L1. e, Genomic and transcript sequences from the PD-L1 locus 
containing 327 bp inversion within the last exon identified in case ATLO17. 
Aberrant PD-L1 transcripts have a putative poly-A signal sequence in the 
inverted region, followed by poly-A tract. 
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Extended Data Figure 3 | Elevated PD-L1 mRNA expression in ATL 
according to PD-L1 SV state. Diagonal plots between PD-L1 exon 3 
expression (RPKM) and its relative value to that of 3’-UTR (exon 3 to 
3'-UTR ratio) for 43 ATL cases. SV (+) cases are indicated by 
corresponding colours to each SV type. 
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Extended Data Figure 4 | PD-L1 fusion proteins identified in ATL. 

a, Amino acid sequence alignment of wild-type and truncated PD-L1 
proteins. Transmembrane domain is shaded in blue. Conserved regions 
are shown in red. b, Western blot analysis with antibodies against the 
N-terminal and C-terminal domains of PD-L1 in PC-9 cells transduced 
with indicated PD-L1 constructs. c, d, Flow cytometry plots for PD-L1 


surface expression (c) and PD-1 Ig binding (d) in PC-9 cells transduced 
with indicated PD-L1 constructs. e, Western blot of PD-L1 SV(-+) cases 
harbouring intact or truncated ORFs, compared with SV(—) cases using 
antibodies specifically detecting N-terminal and C-terminal domains of 
PD-L1. b-e, Representative of three independent experiments. 
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Extended Data Figure 5 | Flow chart for detecting abnormal PD-L1 STAR algorithm. Viral integration within or near the PD-L1 gene was also 
transcripts in the TCGA cohort. In total, 10,210 tumour samples in searched in this cohort. After manual review by IGV, a total of 32 cases 
33 tumour types, for which RNA-seq data were available in TCGA, were with aberrant PD-L1 transcription were identified. 
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Extended Data Figure 6 | 3’-truncated PD-L1 mRNA transcripts in the TCGA cohort. RNA-seq data are visualized by IGV for TCGA samples with 
abnormal PD-L1 transcription. Aberrant PD-L1 transcripts were shown in red. 
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Extended Data Figure 7 | Aberrant transcription affecting PD-L1 
3’-UTR and associated genomic alterations identified in multiple 
cancers. a, Genomic structure of the rearranged PD-L1 locus and 
transcription in FA-A4XK-01 (DLBC), BP-4983-01 (KIRC), L5-L4OE-01 
(ESCA), and F5-6814-01 (READ), showing loss of PD-L1 3’-UTR 
transcription and fusion transcripts between PD-L1 and intronic or 
intergenic segments. Breakpoints (blue dotted lines) are shown with 
accompanying copy number alterations. Del, deletion. b, PD-L1 DNA copy 
number versus JAK2 mRNA expression across 48 DLBC (left), 415 STAD 


(middle), and 43 ATL (right) samples. SV(+) samples (red) and those with 
9p24.1 copy number gains involving both JAK2 and PD-L1 genes (orange) 
are indicated. P values for the effects of PD-L1 SVs and copy number on 
JAK2 expression (GLM) are shown. c, Genomic structure of the rearranged 
PD-L1 locus and transcription in two cases with viral integrations around 
the PD-L1 gene; a STAD case (FP-7998-01) with an EBV integration (top) 
and an HNSC case (CV-5443-01), showing HPV 16 integration, which was 
described previously’”, and premature termination of PD-L1 transcripts 
within intron 4 (bottom). 
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Extended Data Figure 8 | Induction of Pd-l1 3’-UTR deletions and Cas9 was expressed without (parental) or with no sgRNA (mock), 
inversions in mouse cell lines using the CRISPR-Cas9 system. or a pair of Pd-l1 sgRNAs. e, Sequence chromatogram of the detected Pd-l1 
a, Positions of targeting sgRNAs used for CRISPR-Cas9-mediated 3'-UTR deletions from sgPd-ll-transfected EG7-OVA, P815, and B16-F10 
disruption of Pd-I1 3'-UTR are indicated by arrows. b, Pd-l1 surface cells. f, g, Pd-11 exon 4 mRNA expression (RPKM) was calculated from 
expression in EG7-OVA cells transfected with Cas9 and no, single, the RNA-seq data for EG7-OVA, P815, and B16-F10 cells in which Cas9 
or pairwise sgRNAs. Representative of three independent experiments. was expressed without (parental) or with no sgRNA (mock), or a pair of 
c, d, PCR detection of the Pd-l1 3’-UTR deletion (c) or inversion (d) Pd-l1 sgRNAs (f). RNA-seq reads within the Pd-l1 gene were visualized 
breakpoint junction from EG7-OVA, P815, and B16-F10 cells in which by IGV (g). 
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Extended Data Figure 9 | Induction of PD-L1 3’-UTR deletions and 
inversions in human cell lines using the CRISPR-Cas9 system. a, PCR 
detection of the PD-L1 3'-UTR deletion breakpoint junction from 

T2 cells in which Cas9 was expressed without (parental) or with no sgRNA 
(mock), or a pair of PD-L1 sgRNAs. b, Sequence chromatogram of the 
detected PD-L1 3’-UTR deletions from sgPD-L1-transfected HEK293T 
and T2 cells. c, PCR detection of the PD-L1 3’-UTR inversion breakpoint 
junction from HEK293T, T2, and PC-9 cells in which Cas9 was expressed 


Fusion = No stimulation = IFN-y 300U/ml 


without (parental) or with no sgRNA (mock), or a pair of PD-L1 sgRNAs. 
d, Sequence chromatogram of the detected PD-L1 3'-UTR inversions from 
sgPD-L1-transfected HEK293T, T2, and PC-9 cells. e, Visualization of 
RNA-seq reads within the PD-L1 gene for T2 and PC-9 cells in which Cas9 
was expressed without (parental) or with no sgRNA (mock), or a pair of 
PD-L1 sgRNAs. f, Flow cytometric analysis of PD-L1 surface expression in 
parental or sgPD-L1-transfected PC-9 cells stimulated with IFN-y (100 or 
300 Uml1"!) for 48h. Representative of three independent experiments. 
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Extended Data Figure 10 | Tumour-intrinsic Pd-l1 activation by 3’- 

UTR loss suppresses CD8* cytotoxic T lymphocyte recruitment within 

the tumour microenvironment. a, Strategy for evaluating the effect of 

Pd-l1 3'-UTR disruption on anti-tumour immunity. b, Representative 

immunofluorescence images (from experiments in Fig. 4c) of CD8 (green) 

and DAPI (purple) staining in mock- and sgPd-l1-transfected EG7- 

OVA tumours treated with PBS or poly(I:C). c, Flow cytometric analysis 
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showing frequency of CD8* T cells infiltrating into mock- and sgPd-11- 
transfected EG7-OVA tumours treated with PBS or poly(I:C) (n =6 per 
group; Welch’s t-test). Data represent mean + s.e.m. d, Flow cytometric 
analysis showing frequency of CD8* T cells infiltrating into sgPd-11- 
transfected, poly(I:C)-treated EG7-OVA tumours treated with isotype 
control or anti-Pd-ll antibody (n=7 per group; Welch's t-test). Data 
represent mean +s.e.m. 
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Image-based detection and targeting of therapy 
resistance in pancreatic adenocarcinoma 
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Pancreatic intraepithelial neoplasia is a pre-malignant lesion 
that can progress to pancreatic ductal adenocarcinoma, a highly 
lethal malignancy marked by its late stage at clinical presentation 
and profound drug resistance'. The genomic alterations that 
commonly occur in pancreatic cancer include activation of KRAS2 
and inactivation of p53 and SMAD4 (refs 2-4). So far, however, it 
has been challenging to target these pathways therapeutically; thus 
the search for other key mediators of pancreatic cancer growth 
remains an important endeavour. Here we show that the stem 
cell determinant Musashi (Msi) is a critical element of pancreatic 
cancer progression both in genetic models and in patient-derived 
xenografts. Specifically, we developed Msi reporter mice that 
allowed image-based tracking of stem cell signals within cancers, 
revealing that Msi expression rises as pancreatic intraepithelial 
neoplasia progresses to adenocarcinoma, and that Msi-expressing 
cells are key drivers of pancreatic cancer: they preferentially 
harbour the capacity to propagate adenocarcinoma, are enriched 
in circulating tumour cells, and are markedly drug resistant. This 
population could be effectively targeted by deletion of either Msil or 
Msi2, which led to a striking defect in the progression of pancreatic 
intraepithelial neoplasia to adenocarcinoma and an improvement in 
overall survival. Msi inhibition also blocked the growth of primary 
patient-derived tumours, suggesting that this signal is required 
for human disease. To define the translational potential of this 
work we developed antisense oligonucleotides against Msi; these 
showed reliable tumour penetration, uptake and target inhibition, 
and effectively blocked pancreatic cancer growth. Collectively, 
these studies highlight Msi reporters as a unique tool to identify 
therapy resistance, and define Msi signalling as a central regulator 
of pancreatic cancer. 

To understand the mechanisms that underlie pancreatic cancer 
development and progression, we investigated signals that control 
self-renewal, a key stem cell property often hijacked in cancer. In par- 
ticular, we focused on the role of Msi, a highly conserved RNA binding 
protein originally identified in Drosophila>. While Msi has long been 
used as a marker of stem/progenitor cells®, the breadth of its func- 
tional impact is only beginning to emerge: genetic loss-of-function 
models have shown that Msi signalling is important for maintaining 
stem cells in the mammalian nervous system’, and more recently in 
normal and malignant haematopoiesis®". However, the role of Msi in 


pancreatic cancer biology and whether it may be a viable therapeutic 
target remains unknown. 

To address these questions, we first analysed MSI expression in 
human pancreatic cancers. MSI and MSI2 were expressed in all pri- 
mary tumour samples analysed, with expression increasing during pro- 
gression (Extended Data Fig. 1). To track the function of Msi-expressing 
cells, we developed Msi knock-in reporters (Reporter for Musashi, 
REM) in which fluorescent signals reflected endogenous Msi expres- 
sion (Fig. 1a, b, Extended Data Fig. 2a—c and ref. 31). To define whether 
Msi-expressing cells contribute to pancreatic cancer, we crossed REM 
mice to the Kras/S/-G12b/ *p53";Ptflak” * model!3-}> (Extended Data 
Fig. 2d-h). In vivo imaging of living tumours revealed clear Msil and 
Msi2 reporter activity within remarkable spatially restricted domains 
frequently surrounded by blood vessels (Fig. 1c, d, Extended Data Fig. 2i 
and Supplementary Video 1). Cells with high levels of Msi reporter 
expression were rare, and detected in 1.18% and 9.7% of REM1 and 
REM2 cancers, respectively (Fig. le, f). Because cancer stem cells can 
be similarly rare'*!”, we tested if Msi-expressing cells have preferen- 
tial capacity for tumour propagation'®. Consistent with this possibility, 
Msi* cells expressed ALDH”, and were dramatically more tumorigenic 
in vitro and in vivo (Fig. 1g-i and Extended Data Fig. 3a-g). Most impor- 
tantly, Msi2* cells were highly lethal: while 100% of mice orthotopically 
transplanted with Msi2* cells developed invasive tumours and died, 
none of the mice receiving Msi2~ cells showed signs of disease (Fig. 1j 
and Extended Data Fig. 3h). Given the suggestion that certain markers 
may not consistently enrich for tumour propagating ability”®, our find- 
ings indicate that Msi expression can identify cancer stem cells at least 
in some contexts, and that Msi2* cells preferentially drive pancreatic 
cancer growth, invasion, and lethality. 

Msi2* cells also represented a high proportion of circulating tumour 
cells, and were more tumorigenic than Msi2~ circulating tumour cells 
(Fig. 1k, 1). While this suggests that Msi2+ circulating tumour cells 
may pose a greater risk for tumour dissemination", the fact that Msi 
was not consistently elevated in metastatic patient samples analysed 
leaves open the question of its role in metastasis. The Msi reporter 
also provided an opportunity to define if it could be used to identify 
therapy resistance. Exposure to gemcitabine led to preferential sur- 
vival of Msi2* cells even at high doses (Fig. 1m, n and Extended Data 
Fig. 3i-k). These experiments show that Msi2* cells are a predominant 
gemcitabine-resistant population, and suggest Msi reporters could 
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Figure 1 | Msi reporter? pancreatic cancer cells are enriched for 
tumour-initiating capacity. a, b, Design of Msi reporter constructs 
(REM1, Msil°*??’*; REM2, Msi2°CF?’*), ¢, d, Live images of Msi reporter 
cells in (c) REM1-KP*'C and (d) REM2-KP*‘C tumours; VE-cadherin 
(magenta), Hoechst (blue), Msi reporter (green). e, f, Msil and Msi2 
reporter expression in dissociated tumours (n =6). g, h, Sphere-forming 
ability of Msi reporter? and reporter” cells (g, 1 = 8; h, n=6). i, In vivo 
growth of Msi2 reporter* tumour cells (n= 8). j, Survival of mice 
orthotopically transplanted with Msi2 reporter* and reporter” KP!‘C 
tumour cells (n = 6). Log-rank (Mantel-Cox) survival analysis (P< 0.05). 
k, Reporter frequency in primary tumours (n = 3), and circulating tumour 
cells from ascites (n = 3) or peripheral blood (n=4). 1, Average frequency 
of tumour-spheres from Msi2 reporter and reporter~ circulating tumour 
cells (n = 2-4 technical replicates). m, n, Reporter frequency in REM2-KPC 
mice treated with vehicle or 500 mg per kg (body weight) gemcitabine 
(n=6). Data are represented as mean +s.e.m. *P< 0.05, **P< 0.01, 

***P <().001, ****P < 0.0001 by Student’s t-test or one-way analysis of 
variance (ANOVA). Source data for all panels are available online. 


serve as a tool to visualize drug-resistant cells, and identify therapies 
to target them. 

Because Msi expression rose during progression (Extended Data 
Figs 1f-k and 4a), and marked therapy-resistant cells, we tested if 
genetic or pharmacological targeting of Msi could eradicate this ‘high 
risk population. Deletion of Msil led to a fivefold reduction in tumour 
volume by magnetic resonance imaging (MRI) (Fig. 2a, b, Extended 
Data Fig. 4b and Supplementary Videos 2-4). Histologically, adenocar- 
cinoma areas comprised 67% of wild-type (WT)-KPC but less than 
10% of Msil~/~-KP"C pancreata; further, while Msil loss allowed low- 
grade pancreatic intraepithelial neoplasias (PanINs) to form, it largely 
blocked progression to adenocarcinoma (Fig. 2c-f and Extended Data 
Fig. 4c, d). Finally, Msil deletion improved survival in orthotopic grafts: 
median survival for WT-KP""C graft recipients was 28.5 days, and for 
Msil~/~-KP“C grafts was 70.5 days, representing a 2.5-fold increase in 
survival time and a 23-fold decrease in risk of death (Fig. 2g). 

Because both Msil and Msi2 are expressed in pancreatic cancer, 
we also analysed the impact of deleting Msi2 (ref. 9). MRI showed 
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Figure 2 | Loss of Msil or Msi2 impairs tumour initiation and 
progression in a genetic mouse model of pancreatic cancer. a, Coronal 
and sagittal MRI images of normal, WT-KP!"C, and Msil~/~-KP“!C 
mice with three-dimensional volume rendering of tumour mass (red). 

b, Average volumes of isolated WT-KP"C (n= 13) and Msil~/~-KP!C 
tumours (n = 9). ¢, d, Histology and (e, f) quantification of PanIN and/or 
adenocarcinoma areas in WT-KP"C and Msil~/~-KP“C tumours. 

g, Survival of mice orthotopically grafted with Msil~/~-KPC or 
WT-KP!C tumours (n= 16). Analysis of Msi2~/~-KP*C tumours (h) by 
MRI and (i) after isolation, WT-KP“'C (n =5), Msi2~/~-K PC (n=7). 
j-m, Histology of WT-KPC and Msi2~/~-KPC pancreatic tumours 
(x40 magnification); k, adenocarcinoma, liver invasion (green arrows); 
1, adenocarcinoma (yellow arrows); m, PanINs (blue arrows). n, 0, 
Quantification of PanIN and/or adenocarcinoma areas in WT-KP“'C 
and Msi2~/~-KP“"C tumours (n =6). p. Survival of autochthonous 
Msi2~/~-KPC (n= 19) or WT-KP“C (n= 32) mice. Log-rank (Mantel-Cox) 
survival analysis (P < 0.0001). Data represented as mean + s.e.m. 

**P < 0.01, ***P < 0.001 by Student's t-test. Source data for all panels are 
available online. 


no detectable tumour mass in most Msi2~/~-KP““C mice (Fig. 2h, i, 
Extended Data Fig. 4e and Supplementary Videos 2, 5 and 6). 
Histologically, KPC pancreata were mostly replaced by adenocarci- 
noma, often accompanied by extracapsular invasion into surrounding 
structures; in contrast, Msi2~/~-KP“!C pancreata contained low- 
grade PanIN with rare high-grade PanIN and microscopic foci of 
adenocarcinoma within predominantly normal tissue (Fig. 2j-o0). 
Median survival, tracked in the autochthonous model, was 122 days 
for Msi2~/~-K PC versus 87 days for WT-KP!C mice (Fig. 2p), 
representing a fourfold decreased risk of death. Collectively, our 
data show that Msi inhibition markedly improves disease trajectory, 
leading to an approximate doubling of survival. The fact that the 
mice ultimately succumbed to disease is probably due to the strong 
selection for Msi-expressing escaper cells in Msil and Msi2 single, 
or double, knockout mice (Extended Data Fig. 5). Additionally, 
some redundancy between Msil and Msi2, as well as a partial gene 
fragment present in Msil~/~ mice (data not shown), may also exert 
compensatory activity. 
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Figure 3 | Msi controls expression of key oncogenic and epigenetic 
signals. a, Msi RIP-PCR for indicated transcripts. b, c, Frequency of 
phospho-cMet* cells in WT-KP!'C, Msil~/~-KP"'C, and Msi2~/~-KP'C 
mice (b, n= 8; c,n=6). d, Schematic of CMET exons and 3’ UTR. CLIP 
tags (red triangles) indicate MSI1 binding in 3’ UTR. e, CMET 

3’ UTR luciferase reporter activity in the presence or absence of MSI1 

or MSI2 (n=3 independent experiments). f, Colony formation of MSI1 
or MSI2 knockdown cells with or without cMET (n= 4 independent 
experiments). g, h, Fluorescence-activated cell sorting (FACS) analysis of 
tumours from gemcitabine-treated REM2-KPC mice, in the presence 
or absence of crizotinib and iBet762; vehicle (n= 7), gemcitabine (n= 3), 
gemcitabine + iBet762 (n= 3), gemcitabine + crizotinib (n = 3). Data 
represented as mean + s.e.m. *P < 0.05, **P< 0.01, ***P < 0.001 by 
Student's t-test or one-way ANOVA. NS, not significant. Source data for 
all panels are available online. 


To understand the molecular basis of the effects of Msi loss, we 
genomically profiled Msi deficient tumour cells (Extended Data 
Figs 6 and 7a-d). Msi loss led to downregulation of many key genes, 
including regulators of stem cell function (Wnt7a, Aldh, Lin28), proto- 
oncogenes (cMet, Fos, Fyn) and Regenerating (Reg) family genes, linked 
to gastrointestinal cancers. Among these, analysis of 3’ untranslated 
regions (UTRs) for Msi binding sites and RNA immunoprecipitation 
(RIP)-qPCR identified BRD4, cMET, and HMGA2 as potential direct 
targets (Fig. 3a and Extended Data Fig. 7e). We focused on cCMET”’, 
which was diminished in Msi null pancreatic cancer and bound MSI] in 
ultraviolet-cross-linked immunoprecipitation followed by sequencing 
(CLIP-seq) experiments (Fig. 3b-d and Extended Data Fig. 7f, g). 
cMET could not only be activated molecularly by MSI but also effec- 
tively complemented MSI loss (Fig. 3e, fand Extended Data Fig. 7h). 
While these results suggest that cMet is a direct functional target of 
Msi, it is almost certainly one of many. In fact, the powerful impact 
of Msi on cancer is probably because of its ability to control a broad 
range of programs (Extended Data Fig. 6). In this context, BRD4 and 
HMGA2 may represent a particularly attractive class of targets*>4, 
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Figure 4 | Targeting MSI inhibits pancreatic cancer growth in patient- 
derived xenografts. a, b, Frequency of green fluorescent protein-positive 
(GFP*) tumour cells before and after transplantation. c, MSI1 expression 
after MSI1-ASO free uptake in human pancreatic cancer line (n =3 
independent experiments per dose). d, Colony formation of control or 
MSI1-ASO-treated human pancreatic cancer line (n =3 independent 
experiments). e, In vivo growth of human cell-line-derived tumours in 
control or MSI1-ASO treated mice (n = 10). f, Relative tumour volume 
and (g) rate of growth of KP"'C-derived tumours in control or MSI1- 
ASO-treated mice (n = 8). h, Malat1 expression in autochthonous KP“/C 
tumours after systemic delivery of control or lead-optimized Malat1- 
ASO (n=6). Data represented as mean +s.e.m. *P < 0.05, **P< 0.01, 
*** P< ().001 by one-way ANOVA. NS, not significant. Source data for 
all panels are available online. 


as they could act at an epigenetic level with cMet to collectively mediate 
Msi function. Emphasizing such a potential convergence of epigenetic 
and oncogenic pathways, inhibitors of both Brd4 and cMet effectively 
targeted gemcitabine-resistant Msi2* cells (Fig. 3g, h). 

To complement the mouse models, we tested the impact of MSI inhi- 
bition on primary patient samples, which harbour more complex muta- 
tions, and are uniformly drug resistant. Primary pancreatic cancer cells 
were infected with MSI short hairpin RNAs (shRNAs) and implanted 
as xenografts (Extended Data Fig. 8a). While shMSI cells were equiv- 
alently present at time of transplant, their ability to contribute to the 
tumour mass in vivo was reduced by 4.9- to 6.5-fold (Fig. 4a, b and 
Extended Data Fig. 8b, c), demonstrating that inhibition of either MSI1 
or MSI2 results in marked suppression of primary human pancreatic 
cancer growth. Interestingly, MSI2 expression was more homogene- 
ous in patients than in mouse models (Extended Data Figs la, b and 
2d, e). This could be a consequence of selection due to treatment and 
end-stage disease in patients, or because MSI2 patterns differ between 
mouse models and human disease. However, regardless of the level of 
heterogeneity, our loss-of-function studies indicate that the mouse and 
human disease are both highly dependent on Msi signalling. 

Given that inhibition of Msi has profound effects on pancreatic 
cancer progression, we explored its potential as a therapeutic target 
by developing antisense oligonucleotides (ASOs)*>”° specific for 
MSI1. Because ASO inhibitors are designed on the basis of target RNA 
sequences, they can be a powerful approach for inhibiting proteins such 
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as Msi, considered ‘undruggable’ by traditional approaches”. Of 400 
candidate MSI1-ASOs screened, the two most potent markedly reduced 
colony formation, as well as human cell line and KPC derived tumour 
growth in vivo (Fig. 4c-g and Extended Data Fig. 8d, e). The MSI1- 
ASOs have not yet been lead-optimized, a longer-term process designed 
to maximize therapeutic level efficacy with systemic delivery. To test if 
a lead-optimized ASO can penetrate the tumour microenvironment, a 
lead-optimized ASO against Malat1 was delivered intraperitoneally and 
was effective in knocking down its target both in stem and in non-stem 
cell fractions (Fig. 4h and Extended Data Fig. 8f-j). These studies pro- 
vide proof-of-principle that deliverable Msi inhibitors can antagonize 
pancreatic cancer growth in vivo, and suggest that ASOs should be 
explored further as a new class of therapeutics in this disease. 

The Msi reporters we describe here may be broadly applicable for 
cancer diagnostic and therapeutic studies. Because Msi reporter activity 
can be visualized through live imaging, these mice can be used to track 
cancer stem cells in vivo, and provide a dynamic view of cancer growth 
and dissemination within the native microenvironment. The fact that 
reporter* cells are gemcitabine resistant raises the exciting possibility 
that this could serve as a platform to visualize resistance in vivo. 
Integration of such reporters during drug development may provide 
a powerful complement to conventional screens, and allow identifica- 
tion of therapies that can better target drug-resistant disease. Further, 
the spatially restricted distribution of Msit cells could have important 
implications for designing strategies to loco-regionally target cells that 
drive residual disease and relapse. 

One of the biggest disappointments in pancreatic cancer therapy has 
been the failure of targeted agents to make a meaningful impact. Our 
data demonstrate that Msi function is critical for growth and progres- 
sion of pancreatic cancer, and Msi therefore represents an attractive 
therapeutic target. We also show that cell-penetrating ASOs are able to 
antagonize Msi and inhibit growth of pancreatic cancer. These findings 
highlight the value of targeting Msi, and suggest that ASOs””-*° and 
other antagonists should be developed for pancreatic and other cancers 
marked by high Msi expression. Finally, the rise of Msi in pancrea- 
titis (Extended Data Fig. 9) raises the possibility that Msi inhibition 
could serve as a strategy to decrease the risk of developing pancreatic 
cancer. In the long term, blocking Msi signalling could provide a new 
approach to controlling cancer establishment, progression, and therapy 
resistance. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 


Received 15 September 2015; accepted 7 April 2016. 
Published online 6 June 2016. 


1. Yachida, S. & lacobuzio-Donahue, C. A. The pathology and genetics of 
metastatic pancreatic cancer. Arch. Pathol. Lab. Med. 133, 413-422 
(2009). 

2. Almoguera, C. et al. Most human carcinomas of the exocrine pancreas contain 
mutant c-K-ras genes. Cel/ 53, 549-554 (1988). 

3. Hahn, S.A. et al. DPC4, a candidate tumor suppressor gene at human 
chromosome 18q21.1. Science 271, 350-353 (1996). 

4. Redston, M. S. et a/. p53 mutations in pancreatic carcinoma and evidence of 
common involvement of homocopolymer tracts in DNA microdeletions. 
Cancer Res. 54, 3025-3033 (1994). 

5. Nakamura, M., Okano, H., Blendy, J. A. & Montell, C. Musashi, a neural 
RNA-binding protein required for Drosophila adult external sensory organ 
development. Neuron 13, 67-81 (1994). 

6. Okano, H., Imai, T. & Okabe, M. Musashi: a translational regulator of cell fate. 

J. Cell Sci. 115, 1355-1359 (2002). 

7. Sakakibara, S. et al. RNA-binding protein Musashi family: roles for CNS stem 
cells and a subpopulation of ependymal cells revealed by targeted disruption 
and antisense ablation. Proc. Nat! Acad. Sci. USA 99, 15194-15199 (2002). 

8. Hope, K. J. et al. An RNAi screen identifies Msi2 and Prox1 as having opposite 
roles in the regulation of hematopoietic stem cell activity. Cell Stem Cell 7, 
101-113 (2010). 

9. Ito, T. et al. Regulation of myeloid leukaemia by the cell-fate determinant 
Musashi. Nature 466, 765-768 (2010). 

10. Kharas, M. G. et al. Musashi-2 regulates normal hematopoiesis and promotes 
aggressive myeloid leukemia. Nature Med. 16, 903-908 (2010). 


4 | NATURE | VOL 000 | 00 MONTH 2016 


11. Kwon, H. Y. et al. Tetraspanin 3 is required for the development and 
propagation of acute myelogenous leukemia. Cel! Stem Cell 17, 152-164 
(2015). 

12. de Andrés-Aguayo, L. et al. Musashi 2 is a regulator of the HSC compartment 
identified by a retroviral insertion screen and knockout mice. Blood 118, 
554-564 (2011). 

13. Hingorani, S. R. et al. Preinvasive and invasive ductal pancreatic cancer and its 
early detection in the mouse. Cancer Cell 4, 437-450 (2003). 

14. Kawaguchi, Y. et a/. The role of the transcriptional regulator Ptfla in 
converting intestinal to pancreatic progenitors. Nature Genet. 32, 128-134 
(2002). 

15. Tuveson, D. A. et al. Endogenous oncogenic K-ras(G12D) stimulates 
proliferation and widespread neoplastic and developmental defects. Cancer 
Cell 5, 375-387 (2004). 

16. Reya, T., Morrison, S. J., Clarke, M. F. & Weissman, I. L. Stem cells, cancer, and 
cancer stem cells. Nature 414, 105-111 (2001). 

17. Wang, J. C. & Dick, J. E. Cancer stem cells: lessons from leukemia. Trends Cell 
Biol. 15, 494-501 (2005). 

18. Hermann, P. C. et a/. Distinct populations of cancer stem cells determine tumor 
growth and metastatic activity in human pancreatic cancer. Cell Stem Cell 1, 
313-323 (2007). 

19. Kim, M. P. et al. ALDH activity selectively defines an enhanced tumor-initiating 
cell population relative to CD133 expression in human pancreatic 
adenocarcinoma. PLoS ONE 6, e20636 (2011). 

20. Dosch, J. S., Ziemke, E. K., Shettigar, A., Rehemtulla, A. & Sebolt-Leopold, J. S. 
Cancer stem cell marker phenotypes are reversible and functionally 
homogeneous in a preclinical model of pancreatic cancer. Cancer Res. 75, 
4582-4592 (2015). 

21. Rhim, A. D. et al. EMT and dissemination precede pancreatic tumor formation. 
Cell 148, 349-361 (2012). 

22. Li, C. et al. c-Met is a marker of pancreatic cancer stem cells and therapeutic 
target. Gastroenterology 141, 2218-2227 (2011). 

23. Belkina, A. C. & Denis, G. V. BET domain co-regulators in obesity, inflammation 
and cancer. Nature Rev. Cancer 12, 465-477 (2012). 

24. Cleynen, |. & Van de Ven, W. J. The HMGA proteins: a myriad of functions 
(review). Int. J. Oncol. 32, 289-305 (2008). 

25. Hung, G. et al. Characterization of target mRNA reduction through in situ 
RNA hybridization in multiple organ systems following systemic antisense 
treatment in animals. Nucleic Acid Ther. 23, 369-378 (2013). 

26. Seth, P. P. et al. Short antisense oligonucleotides with novel 2/-4’ 
conformationaly restricted nucleoside analogues show improved potency 
without increased toxicity in animals. J. Med. Chem. 52, 10-13 (2009). 

27. Li, N., Li, Q., Tian, X. Q., Qian, H. Y. & Yang, Y. J. Mipomersen is a promising 
therapy in the management of hypercholesterolemia: a meta-analysis of 
randomized controlled trials. Am. J. Cardiovasc. Drugs 14, 367-376 
(2014). 

28. Hong, D. et al. AZD9150, a next-generation antisense oligonucleotide inhibitor 
of STAT3 with early evidence of clinical activity in lymphoma and lung cancer. 
Sci. Transl. Med. 7, 314ra185 (2015). 

29. Lee, R. G., Crosby, J., Baker, B. F., Graham, M. J. & Crooke, R. M. Antisense 
technology: an emerging platform for cardiovascular disease therapeutics. 

J. Cardiovasc. Transl. Res. 6, 969-980 (2013). 

30. Saad, F. et al. Randomized phase II trial of custirsen (OGX-011) in combination 
with docetaxel or mitoxantrone as second-line therapy in patients with 
metastatic castrate-resistant prostate cancer progressing after first-line 
docetaxel: CUOG trial P-O6c. Clin. Cancer Res. 17, 5765-5773 (2011). 

31. Koechlein, C.S. et al. High resolution imaging and computational analysis of 
hematopoietic cell dynamics in vivo. Nature Comm. (in the press). 


Supplementary Information is available in the online version of the paper. 


Acknowledgements We are grateful to |. Verma, M. Karin, and D. Cheresh for 
advice and comments on the manuscript, A. Luo and T. Wang for technical 
support, G. Yeo for advice on Msi targeting, K. Jenne for advice on MRI imaging, 
N. Patel and P. Mischel for reagents and experimental advice, and E. O’Conner 
and K. Marquez for cell sorting. R.F. is a recipient of a California Institute for 
Regenerative Medicine interdisciplinary stem cell training program fellowship 
and received support from T32 HLO86344 and T32 CA009523, C.K. received 
support from T32 GM007752, N.K.L. received support from T32 GM007752 
and a National Research Service Award F31 CA206416, J.L.K. received support 
from National Institutes of Health (NIH)-F32CA136124 and an Advanced 
Postdoctoral Fellowship from the Juvenile Diabetes Research Foundation, 
and B.Z. received support from T32 GMO07184-33 (Duke University). F.P. is a 
recipient of a California Institute for Regenerative Medicine interdisciplinary 
stem cell training program fellowship and the University of California San Diego 
Clinical and Translational Research Institute KL2 Award. T.l. is the recipient of a 
California Institute for Regenerative Medicine interdisciplinary stem cell training 
program fellowship, J.B. is supported by a postdoctoral fellowship from National 
Cancer Center, and T.R. was supported in part by a Leukemia and Lymphoma 
Society Scholar Award. P.M.G. and M.A.H. are supported by a Specialized 
Program of Research Excellence (SPORE) in Pancreatic Cancer, CA127297, 
a TMEN Tumor Microenvironment Network U54, a National Cancer Institute 
Cancer Center Support Grant P30 CA36727, and an Early Detection Research 
etwork (EDRN) U01 CA111294, M.Sa. is supported by NIH DKO78803 and 

IH CA1948339, J.K.S. is supported by NIH KO8CA168999, R.S. is supported by 
he Clinical and Translational Research Institute (CTRI) grant UL1TROO1442, 


© 2016 Macmillan Publishers Limited. All rights reserved 


and A.M.L. is supported by donations from Ride the Point. This work was also 
supported by CA155620 to A.M.L., DK63031, HLO97767, DP1 CA174422, and 
R35 CA197699 to T.R., and CA186043 to A.M.L. and T.R. 


Author Contributions R.F. designed and performed all experiments related 
to Msi expression and deletion, whole genome and target analysis, and 
ASO delivery in pancreatic cancer; N.K.L. designed and performed all 

live imaging of Msi reporter pancreatic tumours, and provided functional 
analysis of cancer stem cells, circulating tumour cells, and therapy 
resistance; R.F., N.K.L., and M.K. helped write the paper; D.V.J. performed 
histological analysis, and provided mouse and xenograft models; F.P., T.I., 
J.B., C.K., and B.Z. provided experimental data and advice; R.S. performed 
all bioinformatics analysis; M.Y., S.S., and H.O. provided Msil~/~ mice and 
CLIP-seq analysis; M.V. and D.P. performed pathology/in situ hybridization 


LETTER 


analysis; M.Sc. performed MRI analysis; J.K. and M. Sa. provided 
experimental advice, tumour samples, and mouse models; J.S., A.M.L., 
M.V., P.A.G., and M.A.H. provided patient samples; Y.K. and R.M. designed, 
synthesized, and screened MSI ASOs, and provided advice on ASO-related 
experiments. A.M.L. and T.R. conceived the project, planned and guided the 
research, and wrote the paper. 


Author Information Microarray and RNA-seq data have been deposited in 

the Gene Expression Omnibus under accession numbers GSE73312 and 
GSE75797. Reprints and permissions information is available at www.nature. 
com/reprints. The authors declare competing financial interests: details are 
available in the online version of the paper. Readers are welcome to comment 
on the online version of the paper. Correspondence and requests for materials 
should be addressed to A.M.L. (alowy@ucsd.edu) or T.R. (treya@ucsd.edu). 


00 MONTH 2016 | VOL 000 | NATURE | 5 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


METHODS 

Mice. REM1 (Msil°Y?”’*) and REM2 (Msi2°C"?’*) reporter mice were gener- 
ated by conventional gene targeting (Genoway; Fig. 1); all of the reporter mice 
used in experiments were heterozygous for the corresponding Msi allele. The 
Msil" (Msi?) mice were generated by conventional gene targeting by 
inserting LoxP sites around exons 1-4 (Genoway). The Msi2 mutant mouse, B6; 
CB-Msi2%(e-2!T/2Imeg (isi2~/~) was established by gene trap mutagenesis as 
previously described’. H. Okano provided the Msil~/~ mice as previously 
described’. The LSL-Kras G12D mouse, B6.129S4-Kras'”4”4/J (stock number 
008179), and the p53flox/flox mouse, B6.129P2-Trp53'”"!8'/J (stock number 
008462), were purchased from The Jackson Laboratory. M. Sander provided 
Ptfla-Cre mice as previously described’. A. Lowy provided Pdx1-Cre mice 
as previously described!’. Mice were bred and maintained in the animal care 
facilities at the University of California San Diego. All animal experiments 
were performed according to protocols approved by the University of California 
San Diego Institutional Animal Care and Use Committee. No sexual dimorphism 
was noted in any mouse model. Therefore, males and females were equally used 
for experimental purposes and both sexes are represented in all data sets. 
Tumour analysis, tissue dissociation, and cell isolation. (A) Tumour wet weight 
was measured immediately following resection. Tumour volume was calculated 
using the standard modified ellipsoid formula %4(length x width”) (Figs 1i, 2b, 
2i and 4e, f). (B) Mouse pancreatic tumours were washed in RPMI 1640 (Gibco, 
Life Technologies) and cut into 2-4mm pieces immediately after resection. 
Dissociation into a single cell suspension was performed using a Miltenyi Biotec 
Mouse Tumour Dissociation Kit (130-096-730). Briefly, tumour pieces were 
collected into gentleMACS C tubes containing RPMI 1640 dissociation enzymes, 
and further homogenized using a gentle MACS Dissociator. Samples were incu- 
bated for 40 min at 37°C under continuous rotation, then passaged through a 
70\1m nylon mesh (Corning). Red blood cells were lysed using RBC Lysis Buffer 
(eBioscience), and the remaining tumour cells were used for FACS analysis and cell 
sorting. (C) Freshly resected mouse brains were rinsed in PBS, placed in accutase 
(Life Technologies), and cut into <2 mm pieces. Samples were incubated for 15 min 
at 37 °C, then passaged through a 70j1m nylon mesh (Corning). Red blood cells 
were lysed as above before FACS analysis and sorting of brain cells. (D) Bone mar- 
row cells were suspended in HBSS (Gibco, Life Technologies) containing 5% FBS 
and 2mM EDTA and were prepared for FACS analysis and sorting as previously 
described’. Analysis and cell sorting were performed on a FACSAria III machine 
(Becton Dickinson), and data were analysed with FlowJo software (Tree Star). 
Immunofluorescence and immunohistochemical staining. (A) Human primary 
pancreatic cancer tissues were fixed in 10% neutral buffered formalin and paraffin 
embedded at the Moores Cancer Center at University of California San Diego 
according to standard protocols. Sections (7 |1m) were obtained and deparaffinized 
in xylene. The University of Nebraska Medical Center Rapid Autopsy Pancreas 
Program provided a second cohort of human primary pancreatic cancer tissues and 
matched liver metastases. Pancreatic cancer tissue from KPC mice were fixed in 
4% paraformaldehyde and paraffin embedded at the University of California San 
Diego Histology and Immunohistochemistry Core at The Sanford Consortium 
for Regenerative Medicine according to standard protocols. Sections (51m) were 
obtained and de-paraffinized in xylene. Antigen retrieval was performed for 
20-40 min in 95-100°C 1x citrate buffer, pH 6.0 (eBioscience). Sections were 
blocked in TBS or PBS containing 0.1% Triton X100 (Sigma-Aldrich), 10% goat 
or donkey serum (Sigma Aldrich), and 5% bovine serum albumin. (B) Single-cell 
suspensions from mouse pancreatic tumours and brain. Cells isolated by FACS 
were suspended in DMEM (Gibco, Life Technologies) supplemented with 50% FBS 
and adhered to slides by centrifugation at 42g. Twenty-four hours later, cells were 
fixed with 4% paraformaldehyde (USB Corporation), washed in PBS containing 
0.1% Tween-20 (Sigma-Aldrich), and blocked with PBS containing 0.1% Triton 
X-100 (Sigma-Aldrich), 10% goat serum (Invitrogen), and 5% bovine serum albu- 
min (Invitrogen). (C) Single-cell suspensions from mouse bone marrow. Cells were 
allowed to settle onto chambered cover glass (LabTek) coated with poly-t-lysine 
(Sigma) at 37°C, fixed with 4% paraformaldehyde (USB Corporation), washed 
in 1 x Dako wash buffer (Dako), and blocked with Dako wash buffer containing 
10% goat serum (Invitrogen). All incubations with primary antibodies were per- 
formed overnight at 4°C. For immunofluorescent staining, incubation with Alexa 
Fluor-conjugated secondary antibodies (Molecular Probes) was performed for 
Lh at 20-25°C. 4’,6-Diamidino-2-phenylindole (DAPI) (Molecular Probes) was 
used to detect DNA and images were obtained with a Confocal Leica TCS SP5 II 
(Leica Microsystems) or with a Nikon Eclipse E600 fluorescent microscope. For 
immunohistochemical staining, endogenous peroxidase was blocked by incubating 
slides in 3% H2O> for 15 min before primary antibody. Incubation with biotinylated 
secondary antibodies (Vector Laboratories) was performed for 45 min at 20-25°C. 
ImmPACT NovaRED Kit (Vector Laboratories) was used according to the manu- 
facturer’s protocol. Sections were counterstained with haematoxylin. The following 


primary antibodies were used for human tissue sections: rabbit anti-Msil (Abcam, 
ab52865) 441g ml}; rabbit anti-Msi2 (Abcam, ab76148) 11g ml~!; and mouse 
anti-Keratin (Abcam, ab8068) 1:20. The following primary antibodies were used to 
stain mouse tissues: rabbit anti-ALDH1 (Abcam, ab24343) 1:200; rabbit anti-cMet 
(Abcam, ab5662) 1:250; chicken anti-GFP (Abcam, ab13970) 1:250 (for pancreatic 
tumours and brain) or 1:200 (for bone marrow); rabbit anti-Msi2 (Abcam, 
ab76148) 1:500 (for pancreatic tumours and brain) or 1:200 (for bone marrow); 
rat anti-Ki67 (eBioscience, 14-5698) 1:1,000; rat anti-Msil (eBioscience, 14-9896- 
82) 1:500; mouse anti-Keratin (Abcam, ab8068) 1:10; and biotinylated DBA (Vector 
Laboratories, B-1035) 1:1,000. 

Pancreatic tumoursphere formation assay. (A) Pancreatic tumoursphere for- 
mation assays were performed on freshly isolated mouse pancreatic tumour cells 
or circulating tumour cells from peripheral blood modified from ref. 33. Briefly, 
pancreatic tumours from 10- to 13-week-old REM1-KP*C or REM2-KP!'C mice 
were dissociated and FACS sorted for YFP* and YFP” or EpCAM*/GFP* and 
EpCAM*/GFP* cells, respectively. One hundred to 500 cells were suspended in 
100 11 DMEM F-12 (Gibco, Life Technologies) containing 1 x B-27 supplement 
(Gibco, Life Technologies), 3% FBS, 100j1M 8-mercaptoethanol (Gibco, Life 
Technologies), 1x non-essential amino acids (Gibco, Life Technologies), 1x N2 
supplement (Gibco, Life Technologies), 20ng ml! EGF (Gibco, Life Technologies), 
20ng ml! FGF, (Gibco, Life Technologies), and 10ng ml-! ESGRO mLIF 
(Millipore). Culture medium for circulating tumour cells also contained 20ng ml“! 
mHGF (R&D Systems). Cells in medium were plated in 96-well ultra-low adhe- 
sion culture plates (Costar) and incubated at 37°C for 7 days. Sphere images were 
obtained with a Nikon 80i fluorescence microscope. Sphere size was measured 
using ImageJ software version 1.47. 

Lentiviral constructs and production. shRNA constructs were designed and 
cloned into plenti-hU6BX vector with a GFP tag by Cellogenetics. The target 
sequences were 5‘-CCCAGATAGCCTTAGAGACTAT-3’ for MSI1, 5’-CCCA 
GATAGCCTTAGAGACTAT-3’ for MSI2, and 5’-CTGTGCCAGAGTCCTT 
CGATAG-3’ for the control scrambled sequence. Additional (shRNA) target sequences 
were cloned into a plenti-FG12 vector with a TomatoRed tag. These target 
sequences were 5’-ATGAGTTAGATTCCAAGACGAT-3’ for MSI2 and 5’-AGGAT 
TCCAATTCAGCGGGAGC-3’ for the control scrambled sequence. Virus was 
produced in 293T cells transfected with plenti-shRNA constructs along with pRSV/ 
REV, pMDLg/pRRE, and pHCMVG constructs. Viral supernatants were collected 
for 3 days followed by ultracentrifugal concentration at 50,000g¢ for 2h. 

Agarose colony formation assays. MIA PaCa-2, Panc-1, Capan-2, and HPAC 
human pancreatic cancer cell lines were purchased from American Type Culture 
Collection, and cultured in the appropriate growth media as recommended by 
American Type Culture Collection. ASPC1, FG, and AA0779E human pancreatic 
cancer cell lines were provided by A. Lowy, and grown in DMEM containing 10% 
FBS, 1x Glutamax, and 1 x penicillin and streptomycin. Human pancreatic cancer 
cell lines were infected with GFP-tagged or TomatoRed-tagged lentiviral particles 
containing shRNAs for MSI1, MSI2, and a scrambled control. Positively infected 
cells were sorted 72h after transduction. For colony assays, 24-well plates were 
first coated with 0.6% agarose in DMEM without supplements. Cells were plated 
at a density of 2,000 cells per well in 0.3% agarose containing DMEM, 10% FBS, 
NEAA, penicillin and streptomycin, and Glutamax. Growth medium was placed 
over the solidified agarose layers and was supplemented every 3 days. Colonies 
were counted 14 days after plating. 

MRI. MRI was used to determine the pancreatic volumes of the mice in vivo. Mice 
were anaesthetized using 1.5% isoflurane and imaged in a 7.0T small animal scanner 
(Bruker-Biospin). Contiguous coronal slices were acquired using a multi-slice, 
rapid acquisition with relaxation enhancement (RARE) sequence: repetition time/ 
echo time = 4826 ms/33 ms, field of view =6 x 3. cm, and matrix = 126 x 128 with 
up to 44 slices with a thickness of 0.5mm. Segmentation and volume rendering 
were performed using Amira software (FEI Visualization Sciences Group). 
Histological analysis/quantification of PanIN and pancreatic ductal 
adenocarcinoma. Mouse tumours from 4.5- to 13-week-old Msil~/~-KP!“C, 
Msi2~!~-KP“C mice, and WT-KP!C littermates were isolated, fixed in 4% para- 
formaldehyde, and paraffin embedded according to standard protocols. Sections 
(54m) were obtained for haematoxylin and eosin and periodic acid-Schiff/Alcian 
blue staining. To quantify tumour areas, each slide was digitally scanned with an 
Aperio slide scanner. Imagescope software was used to measure pancreatic ductal 
adenocarcinoma area, PanIN area, and normal pancreas area. 

Gene expression microarray, RNA-seq, and bioinformatic analysis. (A) WT-KP!C 
or Msil~/~-KP“C mice were euthanized at 11 weeks of age. Tumours were 
harvested and total cellular RNAs were purified, labelled, and hybridized onto 
Affymetrix GeneChip Mouse Genome 430 2.0 arrays and raw hybridization data 
were collected (VA/VMREF Microarray and NGS Core, University of California San 
Diego). Expression level data were extracted using R package gcrma**5, and nor- 
malized using a multiple-loess algorithm as previously described**. Probes whose 
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expression levels exceed a threshold value in at least one sample were considered 
detected. The threshold value is found by inspection from the distribution plots 
of logy expression levels. Detected probes were sorted according to their q value, 
which is the smallest false discovery rate at which a probe is called significant*”. 
A false discovery rate value of a is the expected fraction of false positives among all 
genes with q < a. False discovery rate was evaluated using significance analysis of 
microarrays and its implementation in the official statistical package samr**. The 
samples were treated as ‘two class paired’ according to the date of RNA extraction. 
No genes reached a significance level of a=0.1. A heat map of selected genes was 
created using in-house software. (B) MIA PaCa-2 cells were infected with GFP- 
tagged or TomatoRed-tagged lentiviral particles containing shRNAs for MSI1, 
MSI2, MSI1 + MSI2, and a scrambled control. At 72h after infection, positively 
infected cells were sorted and total cellular RNAs were isolated using a Qiagen 
RNeasy mini kit. RNA-seq fastq files were processed into transcript-level sum- 
maries using kallisto, an ultrafast pseudo-alignment algorithm with expectation 
maximization. Transcript-level summaries were processed into gene-level sum- 
maries by adding all transcript counts from the same gene. Gene counts were 
normalized across samples using DESeq normalization®’, and the gene list was 
filtered on the basis of mean abundance, which left 13,684 ‘detected’ genes for 
further analysis. Differential expression was assessed with an R package limma”” 
applied to log,-transformed counts. Statistical significance of each test was 
expressed in terms of posterior error probability p® using the limma function 
eBayes‘”, Posterior error probability, also called local false discovery rate, is the 
probability that a particular gene is not differentially expressed, given the prior 
probabilities of the model. The list of genes sorted by p® (in ascending order) 
was analysed for over-represented biological processes and pathways using a non- 
parametric version of gene set enrichment analysis**“*, Denoting p"(1) as the prob- 
ability that a gene is not differentially expressed in the Msil knockdown and p*(2) 
the probability that a gene is not differentially expressed in the Msi2 knockdown, 
the probability that a gene is differentially expressed in both samples was estimated 
as [1 — p*(1)][1 — p*(2)]. By the same token, the probability that a gene is differ- 
entially expressed in the Msil knockdown but not in the Msi2 knockdown was 
estimated as [1 — p"(1)]p*(2); likewise with indices 1 and 2 switched. 

Reverse transcription PCR. RNA was isolated using RNeasy Micro and Mini kits 
(Qiagen) and converted to cDNA using Superscript III (Invitrogen). Quantitative 
PCR was performed using an iCycler (BioRad) by mixing cDNAs, iQ SYBR Green 
Supermix (BioRad), and gene specific primers. Primer sequences are available 
upon request. All real-time data were normalized to actin or GAPDH. 

In vivo transplantation assay and analysis. In vivo we focused on the tumorigenic 
potential of Msi2 reporter cells since Msil* cells were unable to form tumours in 
small numbers (100, 1,000), possibly because they are less tumorigenic or more 
quiescent (data not shown). Pancreatic tumours from 10- to 13-week-old REM2- 
KPC mice were dissociated and FACS sorted for EpPCAM*/reporter+ (GFP*) 
and EpCAM*/reporter” (GFP) cells. GFP* and GFP™ cells (100, 500, 1,000, or 
5,000) were suspended in DMEM (Gibco, Life Technologies) containing 10% FBS, 
then mixed 1:1 with matrigel (BD Biosciences). Cells were injected subcutane- 
ously into the left or right flank or orthotopically into the tail of the pancreas 
of 5- to 8-week-old NOD/SCID Il2ry~/~ (NSG) recipient mice. Subcutaneous 
tumour dimensions were measured with callipers every 7 days for 8-12 weeks. 
At endpoint, flank tumours were removed, volume calculated, and dissociated as 
described above. Tumour cells were stained with anti-mouse EpCAM antibody 
(eBiosciences) then analysed for GFP expression by flow cytometry on a FACSAria 
III machine (Becton Dickinson), and data analysed with FlowJo software (Tree 
Star). Subcutaneous tumours did not exceed 2. cm in diameter as per the University 
of California San Diego Institutional Animal Care and Use Committee Policy on 
Experimental Neoplasia. 

Patient-derived xenograft infection and in vivo transplant. Patient samples were 
obtained from Moores Cancer Center at the University of California San Diego from 
Institutional Review Board-approved protocols with written informed consent in 
accordance with the Declaration of Helsinki. All knockdown experiments were con- 
ducted with the construct shCTRL (scrambled), sh MSI, and shMSI2. Briefly, freshly 
dissociated (GentleMACS Dissociator, Miltenyi) patient-derived xenograft cells were 
plated in RPMI-1640 with 20% FBS, 1x glutamax, 1x non-essential amino acids, 
1001U ml"! penicillin, and 100j1g ml“! streptomycin. Cells were transduced with 
GFP-tagged lentiviral shRNAs, and FACS analysis was performed after 24h on a por- 
tion of the cells; the remaining cells were transplanted into the flank of 5- to 8-week- 
old NSG recipient mice. Tumour size was monitored by calliper measurement, and 
mice were euthanized when tumours reached 2 cm in diameter. Subcutaneous 
tumours did not exceed 2 cm in diameter as per the University of California San 
Diego Institutional Animal Care and Use Committee Policy on Experimental 
Neoplasia. Tumours were harvested, dissociated, and analysed by FACS. 
RIP-qPCR. HEK 293T cells were transfected with MSCV-Flag-Msi2-IRES- 
tNGER and lysed 72h after transfection. RNA-immunoprecipitation was 
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performed with anti-Flag antibody (Sigma-Aldrich) or control immunoglobulin-G 
(IgG) using an EZ-Magna RIP kit according to the manufacturer’s protocol 
(Millipore). Immunoprecipitated RNA was converted to cDNA and analysed for 
the expression of indicated genes by real-time PCR. 

CLIP-seq. Briefly, MIA PaCa-2 cells were ultraviolet cross-linked with a 
Stratalinker (Model 2400, Stratagene). Cells were lysed and supernatant added to 
Dynabeads conjugated to MSI1 antibody (clone 14H1, eBiosciences). CLIP library 
preparation and sequencing, as well as sample preparation and sequencing, were 
performed as previously described*®. A total of 73,329 unique tags were obtained 
from MSI1-bound targets including tags with the binding core sequence ‘rUAG’ 
site, as reported previously*®. 

MET rescue assay. Using gateway technology, pENTR-Human cMET was engi- 
neered into the phLENTI-PGK-PURO DEST vector. MIA PaCa-2 cells were infected 
with pLENTI PGK-MET or pLENTI PGK-EMPTY virus. After the establishment 
of the stable cell line over-expressing cMET, lentiviruses containing shRNAs for 
Control, MSI1, or MSI2 were delivered. Cells were sorted for GFP expression and 
plated into a soft agar colony assay. Colonies were counted 14 days after plating. 
In vivo and in vitro drug therapy. Nine- to 10-week-old REM2-KP""C mice were 
treated with gemcitabine alone or in combination with crizotinib or iBet762 for 
6 days. On day 6, tumours were removed, dissociated (as described above), counted 
for total cellular content, stained with anti-mouse EpCAM antibody, and ana- 
lysed for reporter expression by flow cytometry. Gemcitabine (Sigma, G6423) was 
resuspended in H2O at 20mg ml! and delivered at 200 mg per kg (body weight) 
or 500 mg per kg (body weight) by intraperitoneal injection twice over 6 days (on 
days 0 and 3). Crizotinib (Seleckchem PF-02341066) was resuspended in dimethyl- 
sulfoxide (DMSO) at 50mg ml |, diluted 1:10 in H20, and delivered at 100mg per 
kg (body weight) per day for 6 days by oral gavage. iBet762 (Selleckchem $7189) was 
resuspended in DMSO at 50mg ml |, diluted 1:10 in H,O, and delivered at 30mg 
per kg (body weight) per day by intraperitoneal injection for 6 days. For in vitro 
drug assay, low-passage Msi2 reporter KP!C cells were loaded with 21M Dil and 
imaged continuously for up to 48h while receiving 10\1M gemcitabine treatment. 
ASO inhibitors. To identify human Msi ASO inhibitors, rapid throughput screens 
were performed to identify effective ASOs as previously described‘”*. ASOs were 
tested in full dose-response experiments to determine potency. The top two most 
effective ASOs were chosen to test free uptake and verify target knockdown in MIA 
PaCa-2 cells. The sequences of Gen 2.5 MSI1 ASOs used for the study were ASO-1, 
5!-ATATGATACAGGACGG-3! and ASO-2, 5’-TTACATATGATACAGG-3’, with 
underlined letters indicating cEt-modified bases. The sequence of Gen 2.5 scram- 
bled (5'‘-GGCTACTACGCCGTCA-3’) ASO with no perfect match for any known 
transcript was included as a negative control. (A) In vitro: MIA PaCa-2 cells were 
treated with 0.5-20 1M of antisense compound for 24h, after which cells were lysed 
and RNA isolated. Gene expression was assessed with Taqman probes for MSI1 and 
MSI2. Actin was used to normalize all real-time data. For functional testing, MIA 
PaCa-2 cells were plated in the colony assay as previously described. The growth 
medium was supplemented with 0.25-10|1M of ASO. Cells were supplemented 
weekly with fresh antisense compound. Colonies were counted 21 days after the 
first ASO treatment. (B) In vivo: 5 x 10° MIA PaCa-2 cells were transplanted into 
the flank of 5- to 8-week-old NSG recipient mice. Once tumours were measureable 
at 2 weeks after transplant, 50 \1g of either control ASO or MSI1 ASO-1 in PBS was 
administered intratumorally. ASOs were delivered daily over the course of the 
study. Tumour measurements were recorded every 3 days. Subcutaneous tumours 
did not exceed 2cm in diameter as per the University of California San Diego 
Institutional Animal Care and Use Committee Policy on Experimental Neoplasia. 
(C) In vivo: in 8-week-old WT-KPf/fC mice, either control ASO or Malat1 ASO 
was delivered by intraperitoneal injection at a dose of 50 mg per kg (body weight). 
ASOs were delivered daily for 14 days. On day 15, mice were killed and the tumour 
removed. Tumours were harvested and used as follows: (1) flash frozen for RNA 
isolation and qPCR analysis for Malat1; (2) placed into 4% paraformaldehyde for 
paraffin embedding, sectioning, and in situ hybridization analysis for Malat1; and 
(3) dissociated and sorted for RNA isolation to compare Malat1 expression in 
EpCAM*/ALDH* and EpCAM*/ALDH™ populations. 

Tumour imaging. Eleven- to 12-week-old REM-KP""C mice were anaesthetized 
by intraperitoneal injection of ketamine and xylazine (100/20 mg per kg (body 
weight)). To visualize blood vessels and nuclei, mice were injected retro-orbitally 
with Alexa Fluor 647 anti-mouse CD144 (VE-cadherin) antibody and Hoechst 
33342 immediately after anaesthesia induction. Pancreatic tumours were removed 
and placed in HBSS containing 5% FBS and 2mM EDTA. Images (80-100 ,4m 
in 1024 x 1024 format) were acquired with an HCX APO L20x objective on an 
upright Leica SP5 confocal system using Leica LAS AF 1.8.2 software. Videos 
were generated using Volocity 3D image analysis software and compressed using 
Microsoft Video 1 compression. 

Circulating tumour cell analysis. Ten- to 13-week-old REM2-KP"'C mice were 
anaesthetized and approximately 100-500 iil of peripheral blood and ascites was 
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collected in PBS containing 5mM EDTA and 2% dextran. Samples were incu- 
bated at 37°C and red blood cells were lysed using RBC lysis buffer (eBiosciences). 
Remaining cells were stained with anti-mouse EpyCAM-PE (eBiosciences) and 
anti-mouse CD45-PE-Cy7 (eBiosciences) antibodies. Analysis was performed 
on a FACSAria III machine (Becton Dickinson) and data analysed with FlowJo 
software (Tree Star). 

In situ hybridization. Msil and Msi2 mRNA were detected in tumour samples 
using RNAscope, an RNA in situ hybridization method that allows signal ampli- 
fication and background suppression. Human tissue was drop-fixed in neutral- 
buffered formalin and processed and embedded in paraffin. Tissue sections (41m) 
were collected in RNase-free manner and dried at room temperature overnight. 
Staining was initiated by baking the slides for 32 min at 60°C, then they were 
deparaffinized, subjected to antigen retrieval, and treated with protease 
(two sequential incubations at 65°C and 75°C for 12 min each) to enhance probe 
penetration, as described by the manufacturer (Advanced Cell Diagnostics). 
Msil-specific and Msi2-specific RNA target probe sets were generated and 
supplied by the manufacturer (Advanced Cell Diagnostics). Sequential ampli- 
fication steps resulted in a large number of horseradish peroxidase molecules 
per mRNA. The probe was visualized by incubation with 3,3’ diaminobenzidine 
(DAB). Sections were counterstained with haematoxylin. All steps of this pro- 
cedure were performed using a Ventana Discovery Ultra (Roche). Slides were 
analysed by conventional light microscopy. 

Msil~/~-KP“C survival curve. For the Msil~/~-KP“C mice, tracking survival 
was complicated by the incidence of hydrocephaly observed in the knockout mice 
reported previously’. To avoid confounding the data with deaths due to non- 
tumorigenic events, we performed orthotopic transplants. Briefly, Msil~ ~-KP"C 
and WT KP"C mice at 8 weeks of age were killed and tumours collected. Tumours 
were divided into four equal chunks, and then surgically transplanted into the 
pancreas of 8-week-old NSG mice. After surgery, the orthotopically transplanted 
mice were tracked for survival. 

Luciferase assay. A Lightswitch Luciferase Assay System (Active Motif) was 
used to assess MSI1 regulation of cMET. Briefly, 1 x 10* MIA PaCa-2 cells were 
plated into 96-well plates and cultured for 24h. Fifty nanograms of cMET 3’ 
UTR GoClone ($811259, Active Motif) plasmid DNA and increasing concentra- 
tions (Ong, 50 ng, and 100 ng) of either PGK-GFP or PGK-MSI1 plasmid vector 
DNA were co-transfected into MIA PaCa-2 cells. After 24h, cells were lysed using 
Lightswitch Luciferase Assay Reagent (LS100, Active Motif) and luciferase activity 
measured using a plate scanner (Infinite 200, Tecan). 

Caerulein-induced pancreatitis. Four-week-old C57BL/6 mice received 8 injec- 
tions of 50\1g per kg (body weight) caerulein (Sigma-Aldrich) or PBS hourly each 
day for 2 consecutive days (for a total of 16 injections). Pancreata were isolated 
2 days after the last injection, fixed in 4% paraformaldehyde, and paraffin embedded 
according to standard protocols. Sections (7 |1m) were obtained, deparaffinized in 
xylene, and stained as described above. 


Statistical analysis. Statistical analyses were performed using GraphPad Prism 
software version 6.0d (GraphPad Software). Sample sizes were determined on the 
basis of the variability of pancreatic tumour models used. Tumour-bearing animals 
within each group were randomly assigned to treatment groups. The investigators 
were not blinded to allocation during experiments and outcome assessment. Data 
are shown as the mean + s.e.m. Two-tailed unpaired Student's t-tests with Welch's 
correction or one-way ANOVA for multiple comparisons when appropriate were 
used to determine statistical significance (*P < 0.05, **P < 0.01, ***P< 0.001, 
KEP <0,0001). 
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Extended Data Figure 1 | See next page for caption. 
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Extended Data Figure 1 | The Musashi genes MSI1 and MSI2 

are expressed in human pancreatic adenocarcinoma. a, Top row: 
representative images of a primary patient pancreatic adenocarcinoma 
sample stained with anti-keratin (green), DAPI (blue), and anti-MSI1 
(red) antibodies. White arrows indicate MSI1~ cells; yellow arrow 
indicates a MSI1* cell. a, Bottom row: representative images of a primary 
patient pancreatic adenocarcinoma sample stained with anti-keratin 
(green), DAPI (blue), and anti-MSI2 (red) antibodies. White dotted 
regions indicate MSI2~ cells while yellow dotted regions indicate MSI2* 
cells. b, Top row: representative images of a primary patient pancreatic 
adenocarcinoma sample stained with anti-keratin (green), DAPI (blue), 
and anti-MSI1 (red) antibodies. White arrows indicate MSI1~ cells; yellow 
arrow indicates a MSI1* cell. b, Bottom row: representative images of a 
primary patient pancreatic adenocarcinoma sample stained with anti- 
keratin (green), DAPI (blue), and anti-MSI2 (red) antibodies. Yellow 
dotted region indicates MSI2* cells. c, Top row: representative images of a 
matched liver metastasis from a patient with pancreatic adenocarcinoma 
stained with anti-keratin (green), DAPI (blue), and anti-MSI1 (red) 
antibodies. White arrows indicate MSI1~ cells; yellow arrows indicate 
MSI1* cells. c, Bottom row: representative images of a matched liver 
metastasis from a patient with pancreatic adenocarcinoma stained with 


anti-keratin (green), DAPI (blue), and anti-MSI2 (red) antibodies. Yellow 
dotted region indicates MSI2* cells. d, Quantification of MSI1 and MSI2 
expression in four patients comparing primary pancreatic adenocarcinoma 
to the patient-matched liver metastasis; four images analysed per patient. 
e, Quantification of the frequency of MSI1* and MSI2* cells in four 
patients comparing primary pancreatic adenocarcinoma to the patient- 
matched liver metastasis; four images analysed per patient. f, MSI1 and (g) 
MSI2 expression in normal pancreas (n= 1), PanIN (n= 9), and pancreatic 
adenocarcinoma samples (n= 9). h, Quantification of MSI2 expression 
from a human tissue array comparing grade 1 (well-differentiated, 

n=9), grade 2 (moderately differentiated, n = 12), and grade 3 (poorly 
differentiated, n = 16) adenocarcinoma relative to normal pancreas 

(n= 14) and normal adjacent pancreas (n = 16). i, MSI and (j) MSI2 
expression in well-differentiated, moderately differentiated, and poorly 
differentiated human pancreatic cancer cell lines (n = 3 independent 
experiments). k, Colony formation of well-differentiated, moderately 
differentiated, and poorly differentiated human pancreatic cancer cell lines 
(n=3 independent experiments). Data are represented as mean + s.e.m. 
Total magnification x 200 (a-c). Source data for all panels are available 
online. 
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Extended Data Figure 2 | Validation of Msil and Msi2 reporter mice. 

a, FACS analysis of Msi2 reporter expression in haematopoietic stem cells, 
progenitors, and lineage-positive differentiated cells. b, Representative 
image of Msil expression in FACS-sorted YFP* neuronal cells; YFP 
(green), Msil (red), and DAPI (blue). c, Representative image of Msi2 
expression in FACS-sorted GFP* haematopoietic cells; GFP (green), 

Msil (red), and DAPI (blue). d, e, Msi-expression in keratin* cells. 

d, Msil-YFP reporter (green, white arrows) and keratin (red) staining was 
performed on tissue sections of REM1-KP"'C mice; e, Msi2-GFP reporter 
(green, white arrows) and keratin (red) staining was performed on tissue 
sections of REM2-KP!C mice. DAPI staining is shown in blue. Rare cells 
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(<5%) were found to be keratin” (possibly mesenchymal population). 

f, Immunofluorescence analysis of Msil and Msi2 expression overlap 

in isolated Ep>CAM* KPC cells (n=3, 1,000 total cells analysed from 

3 independent experiments). Data are represented as mean + s.e.m. 

g, h, Survival of Msi reporter-KP!/ ‘C and WT-KP"C mice. Survival curves 
of (g) Msil¥?/+-KPMC (REM1-KPC, n =21) or WT-KPC (n= 18) 
mice and (h) Msi29F?’+-KPMC (REM2-KP!C, n= 65) or WT-KP“'C 
(n=54) mice. i, Live image of Msi2 reporter cells in REM2-KP""C tumour; 
VE-cadherin (magenta), Hoescht (blue), Msi reporter (green). See also 
Fig. 1c, d. Source data for all panels are available online. 
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Extended Data Figure 3 | Analysis of stem cell traits in Msil and Msi2 
reporter+ KP“‘C populations. a, ALDH expression in reporter* tumour 
cells sorted from REM1-KP“'C (top row) and REM2-KP""C (bottom row) 
mice; ALDH1 (red), DAPI (blue), and GFP or YFP (green). b, Average 
ALDH expression in bulk or Msil and Msi2 reportert tumour cells 

(n=3 each; 90 total cells analysed from 3 REM1-KP“'C and 150 total cells 
analysed from 3 REM2-KP"'C). (c) Average Msi expression in ALDH* 
cells from REM1-KP"'C and REM2-KP"'C tumours (n = 3 independent 
experiments for each genotype). d, e, Representative images of spheres 
formed from (d) Msil and (e) Msi2 reporter* and reporter” tumour 
cells. See also Fig. 1g, h. f, g, In vivo tumour growth of Msi2 reporter* or 
Msi reporter KPC cells at (f) 500 or (g) 1,000 cells (n = 16). See also 
Fig. 1i. (h) Survival of mice orthotopically transplanted with 10,000 Msi2 
reporter? and reporter” KP““C tumour cells (1 =6). See also Fig. 1}. 
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Log-rank (Mantel-Cox) survival analysis (P < 0.05). i, j, Reporter 
frequency in REM2-KP!'C mice treated with vehicle or 200 mg per kg 
(body weight) gemcitabine (n =3 each). See also Fig. 1m, n for high-dose 
(500 mg per kg (body weight)) gemcitabine. Data are represented as 
mean + s.e.m. ***P < 0.001 by Student’s t-test or one-way ANOVA. 

k, Msi2 reporter~ KP!C cells do not turn on Msi2 expression after in vitro 
gemcitabine treatment, suggesting that Msi-reporter* cells are 
differentially resistant to gemcitabine. Low-passage Msi2 reporter KPC 
cells loaded with Dil were live-imaged continuously for up to 48 h. 
Representative series of images from 101M gemcitabine treatment. 
Reporter™ cells (red); GFP reporter* cells (green); tracking of Msi2 
reporter” cells (white arrows); tracking of Msi2 reporter* cells (yellow 
arrows) (n= 3 independent experiments). Source data for all panels are 
available online. 
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Extended Data Figure 4 | Analysis of tumours from Msi null KPC stages of PanINs (yellow boxes) and adenocarcinoma (red box). d, Tumours 
mice. a, Msi2 (green) and Keratin (red) immunofluorescent staining was from 11- to 13-week-old WT-KP"'C (n=6), Msil~/~-KPC (n =3), and 
performed on tissue sections from WT pancreas (normal, n= 3 samples), Msi2~/~-KP“'C (n =3) mice were stained and quantified for percentage 
KRASS!?P/+;ptfla“/+ (PanIN, n=2 samples), and KRASS?7P/+ 953"; of Keratin* tumour cells (red) expressing Ki67 (green); DAPI staining is 
Ptflae/+ (pancreatic ductal adenocarcinoma, n= 3 samples) mice with shown in blue. e, Average weights of WT-KP"C (n=5) and Msi2~/~-KP“'C 
quantification of Msi2 fluorescence in keratin* cells. b, Average weights of tumours (n =7). See also Fig. 2h, i for tumour volume analysis. Data 
WT-KP""C (n= 13) and Msil~/~-KP“C tumours (n = 9). See also Fig. 2a, b are represented as mean + s.e.m. *P < 0.05, **P< 0.01, ***P< 0.001 by 
for tumour volume analysis. c, PAS and Alcian blue stained sections of Student’s t-test or one-way ANOVA. Source data for all panels are available 


pancreata isolated from WT-KP'C represent areas used to identify the online. 
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Extended Data Figure 5 | Selection for escaper Msi-expressing Msi2~/~KP"'C mouse (n= 1). e-g, Immunohistochemical staining for (e) 
cells in Msil, Msi2 single and double knockout KP“C mice. IgG control, (f, red) Msil, and (g, red) Msi2 in a 15-week-old MsilMsi2~/~ 
a-c, Immunohistochemical staining for (a) IgG control (n =4) or (b, ¢, red) double knockout KP“C mouse (7 = 1). h, Survival curves of MsilMsi2~/~- 
Msi2 in 13-week-old WT-KP“'C (n= 4) and Msi2~/~“KP"'C (n=4) mice. 


KPC (n=6) or WT-KP“"C tumours (n = 35). Source data for all panels are 
d, Immunohistochemical staining for Msi2 (red) in 22-week-old available online. 
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Extended Data Figure 6 | Genome-wide analysis of Msi controlled 
programs in pancreatic cancer. a, Genome-wide expression analysis 

of dissociated pancreatic tumours. Microarray analysis was performed 

on RNA from three pairs of WT-KP!'C and Msil~/~-KP"C matched 
littermates. Heat map shows differential expression of selected mRNAs 
identified as part of a stem-cell-associated gene signature. b, Concordantly 
(upper right and lower left quadrants) and discordantly (upper left and 
lower right quadrants) regulated genes (red) in MSI1-knockdown and 
MSI2-knockdown MIA PaCa-2 cells. c, Gene changes specific to 


MSI1-knockdown (turquoise) or MSI2-knockdown (purple) in MIA 
PaCa-2 cells. d, Heat maps indicating concordant, MSI1-specific, and 
MSI2-specific genes. e, Venn diagram displaying the intersection of 
probe sets that are differentially regulated in MSI1-knockdown, MSI2- 
knockdown, and double knockdown of MSI1 and MSI2 in MIA PaCa-2 
cells. Within scatterplots, lighter colour corresponds to a probability > 0.5 
and the darker colour corresponds to a probability > 0.75. Source data for 
all panels are available online. 
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Extended Data Figure 7 | Molecular targets of Msi signalling. 

a, b, Quantitative PCR analysis of (a) Msil and (b) Msi2 expression in 
MIA PaCa-2 human pancreatic cancer cells relative to normal pancreas 
(n= 3 independent experiments). c, d, Analysis of shaRNA knockdown 
efficiency in GFP*-sorted MIA PaCA-2 cells infected with GFP-tagged 
lentiviral shRNA against scrambled control sequences, (c) MSI1, or 

(d) MSI2 (n= 3 independent experiments). e, Analysis of direct Msi 
targets: Msi consensus binding sites in 3’ UTR of BRD4, HMGA2, and 
cMET transcripts. f, g, Phospho-cMet staining in WT-KP!C and (f) 
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Msil~/~-KP"C, (g) Msi2~!~-KPC mice; keratin (magenta), phospho- 
cMet (green), DAPI (blue). See Fig. 3b-c for quantified data. h, Colony 
formation of MIA PaCa-2 cells infected with empty vector or CMET 
overexpression vector (three independent experiments) shows no 
impact of overexpressed cMet on control MIA PaCa-2 (control for cMet- 
mediated rescue of MSI knockdown in Fig. 3f). Data are represented as 
mean +s.e.m. ***P < 0.001, ****P < 0.0001 by Student's t-test. Source 
data for all panels are available online. 
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Extended Data Figure 8 | Analysis of impaired pancreatic cancer 
growth with shMSI and MSI1-ASOs. a, Schematic for inhibiting MSI 

in primary patient-derived xenografts. b, c, Frequency of GFP* patient 
tumour cells before and after transplantation. See also Fig. 4a, b for 
patients 1 and 2. d, e, MSI1 expression after free uptake of (d) control ASO 
or (e) MSI1-ASO2 in human pancreatic cancer line (n =3 per condition). 
See also Fig. 4c for impact of MSI1-ASO1. f-j, ASO delivery in vivo. 

f, Target knockdown efficacy of lead-optimized ASO in KP'C stem cells. 
Malatl expression in EPDCAM*/ALDH* and EpCAM*t/ALDH* cells 
after systemic delivery of control ASO or lead-optimized Malat1-ASO in 
autochthonous KP!“C model (n = 3 independent experiments). See also 
Fig. 4h for target knockdown in unfractionated Ep>CAM‘ cells. 

g, h, Analysis of potential toxicity of MSI-ASO: g, cage weight of mice 
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or vehicle by intraperitoneal injection; four mice per cage; cage weight 
was measured every 3 days; h, average body weight of mice after 3 weeks 
of daily treatment with MSI1 ASO-1 (50 mg per kg (body weight)) or 
vehicle by intraperitoneal injection (n = 4 mice/cohort). In vivo delivery 
of MSI1 ASOs (50 mg per kg (body weight)) had no deleterious impact 

on body weight and maintained plasma chemistry markers (AST, ALT, 
BUN, T.Bil) within 3x upper limit of normal. i, j, Representative images 
of in situ hybridization for Malat1 (purple) in pancreatic tumours isolated 
from KP""C mice treated by daily intraperitoneal injection with (i) control 
ASO (50 mg per kg (body weight)) or (j) Malatl-ASO (50 mg per kg (body 
weight)) for 14 days. Source data for all panels are available online. 
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Extended Data Figure 9 | Elevated expression of Msi in pancreatitis. 
Msi2 expression in a caerulein-induced mouse model of pancreatitis, 
and in human pancreatitis. a, Msi2 staining and (b) quantification 

of ten images per group in pancreas from PBS-treated (a, top panels, 
n= 1) and caerulein-treated mice (a, bottom panels, n= 1). c, Msi2 
immunohistochemical staining in islets (black dotted outlines) and 
acinar cells (blue squares) in caerulein-treated or PBS-treated mice 
(n=1 for each group). d, Immunofluorescent staining of Msi2 (green) 


in DBA* ductal cells (red) treated with PBS (left panels) or caerulein 
(right panels) (n = 1 for each group); DAPI is shown in blue. e, MSI2 
expression in human tissue arrays from patients presenting with mild 
chronic inflammation (m= 4) and chronic pancreatitis (n = 6) compared 
with normal pancreas (n = 14). Data are represented as mean + s.e.m. 
P< 0.0001 by Student’s t-test. Source data for all panels are available 
online. 
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The bacterial DnaA-trio replication origin element 
specifies single-stranded DNA initiator binding 


Tomas T. Richardson!, Omar Harran! & Heath Murray! 


DNA replication is tightly controlled to ensure accurate inheritance 
of genetic information. In all organisms, initiator proteins 
possessing AAA-+ (ATPases associated with various cellular 
activities) domains bind replication origins to license new rounds 
of DNA synthesis!. In bacteria the master initiator protein, DnaA, 
is highly conserved and has two crucial DNA binding activities’. 
DnaA monomers recognize the replication origin (oriC) by binding 
double-stranded DNA sequences (DnaA-boxes); subsequently, 
DnaA filaments assemble and promote duplex unwinding by 
engaging and stretching a single DNA strand**. While the 
specificity for duplex DnaA-boxes by DnaA has been appreciated 
for over 30 years, the sequence specificity for single-strand 
DNA binding has remained unknown. Here we identify a new 
indispensable bacterial replication origin element composed of a 
repeating trinucleotide motif that we term the DnaA-trio. We show 
that the function of the DnaA-trio is to stabilize DnaA filaments 
on a single DNA strand, thus providing essential precision to 
this binding mechanism. Bioinformatic analysis detects DnaA- 
trios in replication origins throughout the bacterial kingdom, 
indicating that this element is part of the core oriC structure. The 
discovery and characterization of the novel DnaA-trio extends our 
fundamental understanding of bacterial DNA replication initiation, 
and because of the conserved structure of AAA-+ initiator proteins 
these findings raise the possibility of specific recognition motifs 
within replication origins of higher organisms. 

The master bacterial DNA replication initiator, DnaA, is a highly 
conserved multifunctional protein that utilizes distinct domains to 
achieve its two key DNA binding activities. DnaA recognizes dou- 
ble-stranded (ds)DNA using a helix—turn-helix motif (domain IV), 
whereas an ATP-dependent DnaA filament interacts with a single 
DNA strand using residues within the initiator specific motif (ISM; 
an a-helical insertion that distinguishes the family of replication initi- 
ators) of the AAA+ domain (domain III) (Extended Data Fig. la-d)**. 
In contrast to DnaA, bacterial replication origins are diverse; they con- 
tain variable numbers of DnaA-boxes and seemingly lack a common 
architecture®”. Therefore, the sequence information within oriC that 
directs DnaA filament assembly onto a single DNA strand is unknown. 

To investigate how DnaA filament formation could be local- 
ized to the DNA replication origin of Bacillus subtilis, we began by 
characterizing site-directed mutants of the DNA unwinding region 
in vivo (Fig. la and Extended Data Fig. le). To enable identification 
of essential sequences without selecting for suppressor mutations, 
we generated a strain in which DNA replication could initiate from 
a plasmid origin (oriN) integrated into the chromosome (Fig. 1b and 
Supplementary Information). Activity of oriN requires its cognate 
initiator protein, RepN; both of these factors act independently of 
oriC/DnaA®*. Expression of repN was placed under the control of a 
tightly regulated inducible promoter, thus permitting both the intro- 
duction of mutations into oriC and their subsequent analysis after 
removal of the inducer to shut off oriN activity (Fig. 1c and Extended 
Data Fig. 2). 


At the B. subtilis replication origin, DNA unwinding by DnaA is 
detected downstream of DnaA-box elements and includes a sequence 
of 27 continuous A:T base pairs that is thought to facilitate DNA 
duplex opening (Fig. 1a)’. Surprisingly, we were able to delete the entire 
AT-rich sequence (A27) without abolishing origin activity, although 
the mutant strain did display a slow growth phenotype indicating 
that the AT-cluster is required for efficient origin function (Fig. 1d). 
Interestingly, further deletions extending three or six base pairs (A30, 
A33) severely impaired oriC-dependent initiation (Fig. 1d), and a dele- 
tion series targeting the sequence between the GC-rich and AT-rich 
clusters confirmed that this region alone was essential for origin func- 
tion (Fig. le). Scrambling this entire region also inhibited cell growth 
(t1-t6°*), demonstrating that the specific sequence is required, rather 
than the spacing between the flanking elements (Fig. 1f). To explore 
this region in more detail, sequences were scrambled three base pairs 
at a time by exchanging each triplet for its complement. Phenotypic 
and marker frequency analyses revealed that disruption of sequences 
closest to the GC-cluster (t1°* and t25*) caused the greatest defect in 
DNA replication initiation, indicating that the region proximal to the 
DnaA-boxes is most important for origin activity (Fig. 1f). Although 
mutagenesis of neither t4 nor t5 alone produced a detectable effect 
on DNA replication initiation under the conditions tested, they may 
become important when origin firing is suboptimal as was observed 
in the AT-cluster deletion mutant (Fig. 1d). 

To determine whether this essential DNA sequence between the 
GC- and AT-clusters has a role in DNA melting per se, an open com- 
plex formation assay was performed. DnaA was incubated with oriC 
plasmids containing either the wild-type or scrambled sequence 
(t1-t6S“), potassium permanganate was added to oxidize distorted 
bases within the DNA, and base modification was detected by primer 
extension. Scrambling the sequence inhibited open complex forma- 
tion, indicating that this region is necessary for DnaA-dependent 
unwinding (Fig. 1g). 

DnaA monomers are thought to bind DnaA-boxes before 
ATP-dependent filament formation’®. Using the strain capable of 
oriC-independent initiation, the seven DnaA-box sequences were 
individually scrambled to abolish DnaA binding". Culturing these 
strains in the absence of oriN activity revealed that mutation of DnaA- 
box6 severely inhibited growth, and mutation of DnaA-box7 resulted 
in a marked growth defect, while mutation of the remaining DnaA- 
boxes had no observable effect (Fig. 2a). Marker frequency analysis 
confirmed that mutation of DnaA-box7 markedly impaired origin 
activity, whereas mutation of the remaining DnaA-boxes resulted in 
only modest decreases in initiation frequency (Fig. 2a). These results 
indicate that DnaA-boxes proximal to the essential unwinding region 
are most critical for origin activity. 

To directly test whether these DnaA-boxes promote DnaA fila- 
ment assembly at the essential unwinding region we used a previ- 
ously described DnaA filament formation assay’*. Here two cysteine 
residues are introduced within the AAA+ domain such that the 
protein remains functional and when the DnaA filament assembles 
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Figure 1 | Genetic analysis of the oriC DNA unwinding element reveals 
a critical region required for initiation activity. a, B. subtilis oriC 
unwinding region. DnaA-box colouring indicates conservation (consensus 
5'-TTATCCACA-3’ in black). b, The oriC-independent strain used for 
constructing replication origin mutations. c, Growth of an oriC deletion 
mutant is dependent upon oriN activity. d, Deletions extending beyond the 
AT-cluster into the initially unwound region inhibit cell growth. 


the cysteine residues from interacting protomers come into close 
proximity. DNA scaffolds were assembled using oligonucleotides, 
and the cysteine-specific crosslinker bis(maleimido)ethane (BMOE; 
8 A spacer arm) was used to capture the oligomeric species formed 
on each substrate. 

Incubation of DnaA with duplex substrates containing DnaA- 
box6, DnaA-box7 and the GC-rich region produced a dimeric 
species (Fig. 2b, c), whereas incubation of DnaA with a longer duplex 
substrate containing the unwinding region produced a set of larger 
oligomeric complexes. We wondered whether the larger species were 
being formed on the duplex DNA or on a single DNA strand. To test 
these models scaffolds containing single-stranded (ss)DNA tails were 
assembled. DnaA filaments readily formed on substrates containing 
a 5’-tail but were absent when the corresponding 3’-tail was provided 
(Fig. 2c and Extended Data Fig. 3). Formation of DnaA oligomers on 
the 5’-tailed substrate was dependent upon both ATP and the ssDNA 
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e, Sequences between the GC-rich and AT-rich clusters are essential for 
origin function. f, Sequences proximal to DnaA-boxes are most important 
for origin function. Marker frequency analysis was used to measure 

the rate of DNA replication initiation (mean and s.d. of three technical 
replicates). g, Open complex formation by DnaA requires the native 
sequence between the GC- and AT-clusters. DNA duplex unwinding 

was probed by KMnO, and detected by primer extension. 


binding residue Ile190 located within the ISM of the AAA+domain4, 
indicating that the assay was capturing DnaA filament formation on 
ssDNA (Fig. 2c, d and Extended Data Fig. 1d). 

Critically, DnaA oligomer formation on the 5/-tailed substrate was 
specific. DnaA filament assembly was abolished when the DnaA- 
box sequences within the duplex region were scrambled and it was 
notably reduced when the single-stranded region was replaced with 
its complementary sequence (Fig. 2c). Taken together, these results 
suggest that DnaA filaments are loaded from duplex DnaA-boxes onto 
ssDNA bearing a 5/-tail. This model is consistent both with biochem- 
ical experiments showing that Escherichia coli DnaA preferentially 
interacts with the corresponding single-strand of its DNA unwinding 
element and with single molecule studies showing that Aquifex aeolicus 
DnaA filaments form with 3’—5’ polarity!!!>"4. 

DnaA oligomer size was proportional to the length of the 5/-tail up 
to the formation of a heptamer, after which further DNA extension 
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Figure 2 | DnaA filaments are loaded from DnaA-boxes onto a specific 
single-strand sequence within the initially unwound region. a, DnaA- 
boxes proximal to the unwinding region are most important for origin 
function. Marker frequency analysis was used to measure the rate of 
DNA replication initiation (mean and s.d. of three technical replicates). 
b, Sequence of the origin region used for constructing DNA scaffolds in c. 
c, DnaA filament formation using cysteine-specific crosslinking on DNA 
scaffolds. DnaA complexes were resolved by SDS-polyacrylamide gel 
electrophoresis (SDS-PAGE) and detected by western blot analysis. d, 
Crystal structure showing ssDNA (dAj2) bound to the DnaA filament 
through the AAA+ domain (Protein Data Bank (PDB) accession number 
3R8F). e, Sequence of the origin region used for constructing DNA 
scaffolds in f. f, DnaA filament formation on tailed substrates is arrested 
by a poly(A) tract. Long oligomers highlighted within the dotted box are 
shown above with increased contrast. 


did not promote longer filaments (Fig. 2e, f). We noted that this limit 
corresponded to a poly(A) tract in the DNA sequence and wondered 
whether this sequence inhibited DnaA filament formation. When the 
poly(A) tract was replaced by sequences from the beginning of the 
DNA unwinding region, DnaA oligomer length increased beyond a 
heptamer (Fig. 2e, f). This result suggests that the origin unwinding 
region is designed to limit DnaA filament formation to a precise posi- 
tion within oriC. 

To identify a possible single-strand binding motif recognized by 
DnaA, individual base pairs within the essential unwinding region 
were inverted and origin activity was analysed in vivo. Marker fre- 
quency analysis revealed that altering either of two A:T base pairs, 
which were spaced three nucleotides apart from each other, resulted in 
the most significant loss of origin activity; in contrast the surrounding 
mutations had only modest effects (Fig. 3a). Re-examination of the 
unwinding region shows that A:T base pairs are spaced at three nucle- 
otide intervals throughout this sequence (Fig. 1a). This observation 
is strikingly congruent with the mechanism proposed for binding of 
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Figure 3 | Analysis of the key origin unwinding region provides 
evidence for functional trinucleotide repeats. a, Mutagenesis identifies 
A:T base pairs spaced three nucleotides apart are most critical for origin 
activity. Marker frequency analysis was used to measure the rate of DNA 
replication initiation (mean and s.d. of three technical replicates). 

b, Crystal structure showing the interaction of DnaA with sets of three 
nucleotides (PDB accession number 3R8F). Residues for A. aeolicus 
indicated above; B. subtilis below. c, In vivo deletion analysis of the 
unwinding region. Isogenic deletions indicated in black. Marker frequency 
analysis was used to measure the rate of DNA replication initiation 
(mean and s.d. of three technical replicates). d, Growth of mutants used 
inc. e, In vitro deletion analysis of tailed substrates. 


the DnaA filament to ssDNA, where each protomer engages a set of 
three nucleotides (Fig. 3b)*. 

We hypothesized that an array of triplet nucleotide motifs recognized 
by DnaA are present within the unwinding region and that the motifs 
proximal to the DnaA-boxes are most important for origin activity. 
To test this model in vivo we created a set of nested deletions that 
removed either one or three base pairs (Extended Data Fig. 4). All of 
the single base-pair deletions significantly lowered the replication ini- 
tiation frequency and several considerably inhibited cell growth (espe- 
cially the same A:T base pairs noted above), whereas triplet deletions 
encompassing the single deletions had little or no effect (Fig. 3c, d). 
These results are consistent with the model that single base-pair 
deletions act both by disrupting a specific trinucleotide motif and by 
shifting the register of downstream trinucleotide motifs relative to the 
DnaA filament start point at the DnaA-boxes. 

To test the model that the ssDnaA binding motif is indeed a repeat- 
ing trinucleotide, DnaA filament formation was analysed in vitro using 
tailed substrates that contained either single or triplet base deletions 
(Fig. 3e). Whereas deletion of one base produced shorter oligomers, 
deletion of three bases restored formation of full-length complexes. 
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Taken together with the in vivo deletions, these results indicate that 
DnaA filaments bind to ssDNA by recognizing a specific trinucleo- 
tide motif found within the unwinding region. We have termed this 
trinucleotide motif the ‘DnaA-trio. 

To define the precise sequence of the DnaA-trio, DnaA filament 
formation was observed using a series of DNA scaffolds in which the 
5’-tails were extended by increments of one nucleotide. We observed 
that additional oligomeric species appeared after the following 
sequences were added: 3’-GAT-5’, 3/-AAT-5’ and 3'-GAA-5’, suggest- 
ing that these triplets represent individual DnaA-trio motifs (Fig. 4a). 

However, it was surprising that a longer oligomer was not formed 
after addition of the first 3’-GAT-5’ motif proximal to the GC-cluster, 
since mutagenesis of this sequence in vivo resulted in strong pheno- 
types (Fig. 3a, c, d). In structures of the archaeal initiator Orcl bound 
to a replication origin the protein was observed to make two contacts 
with the DNA, one through its carboxy (C)-terminal DNA bind- 
ing domain (analogous to DnaA domain IV) and another through 
its AAA+ motif'>!®. We wondered whether DnaA might similarly 
be capable of contacting both a DnaA-box and the first DnaA-trio, 
thereby accounting for the absence of a DnaA trimer. Importantly, 
BMOE crosslinking of cysteines in the AAA+ domain would not 
detect this activity as the assay captures DnaA oligomers formed on 
either dsDNA or ssDNA”. 

To test this hypothesis, we used the amine-specific crosslinker 
bis(sulfosuccinimidyl)suberate (BS?) which, in contrast to BMOE, 
only captures DnaA oligomers formed on a single DNA strand 
(Extended Data Figs 3 and 5). Crosslinking by BS° reveals a DnaA 
dimer forming in the presence of the first 3’-GAT-5’, indicating that 
DnaA does recognize this sequence (Extended Data Fig. 5). Taken 
together with the BMOE crosslinking showing that a DnaA dimer is 
formed on the dsDNA scaffold containing just DnaA-boxes 6 and 7 
and the GC-cluster, the data suggest that the DnaA protein initially 
bound at DnaA-box7 undergoes a conformational change (detected 
by BS) to engage the first DnaA-trio motif following the GC-cluster. 
Several lines of evidence support the notion that DnaA adopts distinct 
conformations when it engages either dsDNA or ssDNA!” 

To support the assignment of the DnaA-trio, we performed a tar- 
geted mutagenesis of the proposed sequence. The results indicate that 
each of the positions (3’-GAT-5’) appears important for DnaA fila- 
ment formation, specifically the nucleotides at positions 1 and 2, and 
the deoxyribose group at position 3 (Fig. 4b and Extended Data Fig. 6). 
Interestingly, in the crystal structure of DnaA bound to a ssDNA 
substrate, the protein makes no base-specific contacts*. These obser- 
vations suggest either that the sequence of the DnaA-trios is impor- 
tant for an intermediate step in DNA duplex recognition and melting 
before full engagement of the product single-strand, or that the specific 
base sequence promotes the DNA backbone to adopt a favourable 
geometry for DnaA binding. 

Using this information, we first searched for DnaA-trios within 
other well-characterized origin unwinding elements (Fig. 4c, 
underlined)!" In these cases a set of at least three DnaA-trios could 
be identified. These DnaA-trios were located proximal to a DnaA-box 
that shared the same orientation as B. subtilis DnaA-box7, and the 
regions between the DnaA-box and the DnaA-trios were GC-rich. 
Using these additional criteria we next interrogated predicted bacte- 
rial DNA replication origins (DoriC’) for similar patterns. Figure 4c 
shows that similar elements can be identified within putative oriC 
regions throughout the bacterial kingdom. A sequence logo of the 
DnaA-trios indicates that the preferred motif is 3’-S/,AT-5! (Fig. 4d), 
with the central adenine being most highly conserved. We also 
observed that in most cases a pair of tandem DnaA-boxes preceded 
the GC-cluster (Extended Data Table 1). 

We propose that the DnaA-trio constitutes a new element within 
bacterial replication origins. Our findings indicate that DnaA-trios 
play an essential role during DNA replication initiation by provid- 
ing specificity for DnaA filament formation on a single DNA strand, 


4 | NATURE | VOL 000 | 00 MONTH 2016 


3 


a 


RRERER2R22 


ATGAT 
ATAAT 
ATCAT 
aT-aT 
ATGGT 
ATGrT 
arG-T 
ATGAC 
ATGAA 


2 


% 
az \ Trios b 


ic atic a cla a tile a alc at 


° Oligomers 


Anti-DnaA 
[ 
[ 
—— 
Oligomers 


I 
I 


Anti-DnaA 


DnaA-trios 


3’-CTTAGTTGTACGGGATAATGAAAATAATAAATAA-5’ 
3’-CTAAAGTGTCTGAGATAATAATGATAAT-5’ 
3’-GTTAGGTGTCGTGGATGATGATAATGAT-5’ 


DnaA-box GC-rich 


E. faecalis 


S. pneumoniae 
S. aureus 


S Firmicutes 4 L. monocytogenes 3/-TTTAGGTGTCGCGGATAATGATAATGAT-5’ 

3 B. subtilis 3’-TTTAGGTGTCCGGGATGATAATGAAGATGAT-5’ 
al O. iheyensis 3’-TATTAGTGTCCGGGATAATAATAATAATGAC-5’ 

E LC. botulinum 3’~TATAGTTGTTCGGGATGATGATGATGATGATAAA-5’ 
& Cyanobacteria 4 S. elongatus 3/-AAAAGGTGTCGGGGATGATGATGACTAG-5’ 


L. citreum 3’-TCTATTTGTGGGGAATAATAATGATGAT-5’ 
S. coelicolor 

C. glutanicum 
B. bifidum 

E. coli 

B. pertussis 

H. pylori 

B. bacteriovorus 
B. afzelii 

T. pallidum 
A. aeolicus 
le T. maritima 


3’-CATAGGTGTCCGGGATGATGATGAC-5’ 

3’-CAAAGTTGTCCTGAATGACAATAAT-5’ 

3/-GATAGGTGTCCCGAATAATGAT-5’ 

3’-AATAGGTGTCCCGTCACGCTAGGATTAT-5’ 

3’-AATAGGTTCGGGCATCACTACAATCAT-5’ 
3’-GGTAAGTGCGGGGATGATGACAATGATTAATAATAA-5’ 
3’-AAAAGGTGCGGGGATGATGATGATGAT-5’ 
3’-AATTTGTCTTCGGATAATGATAATGATGAT-5’ 


L Actinobacteria 7 


Protobacteria + 


Spirochaetes ae 3’-TATAAGTGTCAATGATAAT-5’ 


3’-AAACAGTGTACTTTTATCGGCGGATAATAATAAGAATAATTAT-5’ 
3’-TTTGGATGGTGGACGCAGGGGATAATAAA-5’ 


Gram negative 


Qa 
© 


az 


ACTATTACTECTACTA” 3° 
‘TGATAATGAAGATGAT-5’ ) 


Figure 4 | Identification of the DnaA-trio motif. a, Varying the length 
of 5’-tailed substrates identifies the likely DnaA-trio sequence. Lane 2 
shows DnaA filament formation on a duplex DNA scaffold (DnaA-box6, 
DnaA-box7, GC-rich cluster). Letters indicate the nucleotide sequentially 
added to the 5/-tail. b, Targeted mutagenesis of the proposed DnaA-trio 
motif. c, Bioinformatic analysis identifies DnaA-trio motifs adjacent to 

a DnaA-box throughout the bacterial kingdom. Underlined sequences 
indicate experimentally determined DnaA-dependent unwinding sites. 

d, DnaA-trios sequence logo (WebLogo”’). e, Schematic of DnaA filament 
formation from double-stranded DnaA-boxes (triangles) onto a single 
strand containing the DnaA-trios. 


thereby promoting DNA duplex unwinding. Our analysis also indi- 
cates that the arrangement of tandem DnaA-boxes in close proximity 
to DnaA-trios is a widespread strategy used to direct DnaA filament 
growth onto the unwinding region, with a single DnaA protein proba- 
bly binding dsDNA via domain IV before engaging a DnaA-trio via its 
AAA+ motif (Fig. 4e and Extended Data Fig. 5). Together our data are 
consistent with the two-step DnaA assembly model for DNA melting"®. 

We note that the configuration between DnaA-boxes and the DnaA- 
trios is not strictly required for DnaA to be loaded onto the single 
DNA strand in vitro. Scaffolds containing either a single DnaA-box or 
containing DnaA-boxes in reverse orientation are competent to pro- 
mote DnaA filament formation from the duplex DNA onto the 5’-tail, 
although in the latter situation DnaA filament formation was reduced 
suggesting that DnaA-box orientation is important (Extended Data 
Fig. 7). Furthermore, loading does not require the flexibly tethered 
domains I/II of DnaA, consistent with previous observations suggest- 
ing that domains III/IV can adopt multiple conformations (Extended 
Data Fig. 7)!*!7!8. These results suggest that some plasticity can be 
accommodated between duplex and single-strand DNA binding ele- 
ments, which is in agreement with recent reanalyses of essential DnaA- 
boxes in E. coli and might also explain the location of atypical origin 
unwinding sites*?-7°. 


© 2016 Macmillan Publishers Limited. All rights reserved 


Analysis of replication initiator proteins from both bacteria and 
archaea shows that the ISM within AAA+ domains is used for DNA 
binding, and the recent structure of the Drosophila origin recogni- 
tion complex (ORC) suggests that this is also probably the case for 
eukaryotes, supporting the model that DNA binding by the ISM is a 
universal feature of replication initiators*!>:!°?”. We find here, for the 
first time, that the interaction of the B. subtilis replication initiator ISM 
with the origin involves recognition of a specific DNA sequence. We 
speculate that motifs analogous to the DnaA-trio might be present in 
replication origins of higher organisms and recognized by the ISM of 
ORC proteins. These sites need not be trinucleotides, nor would they 
necessarily share the same spacing observed for the DnaA-trios as 
they would need to accommodate the arrangement of AAA + interac- 
tions within the respective heterohexameric ORC!””’. The discovery 
of ISM binding motifs in higher organisms would greatly facilitate 
origin identification, an elusive problem precluding the understanding 
of DNA replication control in eukaryotes. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 


Received 9 November 2015; accepted 30 March 2016. 
Published online 8 June 2016. 


1. Duderstadt, K. E. & Berger, J. M. A structural framework for replication origin 
opening by AAA + initiation factors. Curr. Opin. Struct. Biol. 23, 144-153 
(2013). 

2. Messer, W. The bacterial replication initiator DnaA. DnaA and oriC, the 
bacterial mode to initiate DNA replication. FEMS Microbiol. Rev. 26, 
355-374 (2002). 

3. Fuller, R. S., Funnell, B. E. & Kornberg, A. The dnaA protein complex with the 
E. coli chromosomal replication origin (oriC) and other DNA sites. Cel/ 38, 
889-900 (1984). 

4. Duderstadt, K. E., Chuang, K. & Berger, J. M. DNA stretching by bacterial 
initiators promotes replication origin opening. Nature 478, 209-213 (2011). 

5. Fujikawa, N. et al. Structural basis of replication origin recognition by the DnaA 
protein. Nucleic Acids Res. 31, 2077-2086 (2003). 

6. Mackiewicz, P., Zakrzewska-Czerwinska, J., Zawilak, A., Dudek, M. R. & Cebrat, S. 
Where does bacterial replication start? Rules for predicting the oriC region. 
Nucleic Acids Res. 32, 3781-3791 (2004). 

7. Wolanski, M., Donczew, R., Zawilak-Pawlik, A. & Zakrzewska-Czerwinska, J. 
oriC-encoded instructions for the initiation of bacterial chromosome 
replication. Front. Microbiol. 5, 735 (2015). 

8. Hassan, A. K. et a/. Suppression of initiation defects of chromosome replication 
in Bacillus subtilis dnaA and oriC-deleted mutants by integration of a plasmid 
replicon into the chromosomes. J. Bacteriol. 179, 2494-2502 (1997). 

9. Krause, M., Ruckert, B., Lurz, R. & Messer, W. Complexes at the replication 
origin of Bacillus subtilis with homologous and heterologous DnaA protein. 

J. Mol. Biol. 274, 365-380 (1997). 

10. Leonard, A. C. & Grimwade, J. E. Regulation of DnaA assembly and activity: 
taking directions from the genome. Annu. Rev. Microbiol. 65, 19-35 (2011). 

11. Speck, C. & Messer, W. Mechanism of origin unwinding: sequential binding of 
DnaA to double- and single-stranded DNA. EMBO J. 20, 1469-1476 (2001). 

12. Scholefield, G., Errington, J. & Murray, H. Soj/ParA stalls DNA replication by 
inhibiting helix formation of the initiator protein DnaA. EMBO J. 31, 
1542-1555 (2012). 

13. Cheng, H. M., Groger, P, Hartmann, A. & Schlierf, M. Bacterial initiators form 
dynamic filaments on single-stranded DNA monomer by monomer. Nucleic 
Acids Res. 43, 396-405 (2015). 


LETTER 


14. Ozaki, S. et al. Acommon mechanism for the ATP-DnaA-dependent formation of 
open complexes at the replication origin. J. Biol. Chem. 283, 8351-8362 (2008). 

15. Gaudier, M., Schuwirth, B. S., Westcott, S. L. & Wigley, D. B. Structural basis of DNA 
replication origin recognition by an ORC protein. Science 317, 1213-1216 (2007). 

16. Dueber, E. L., Corn, J. E., Bell, S. D. & Berger, J. M. Replication origin recognition 
and deformation by a heterodimeric archaeal Orc1 complex. Science 317, 
1210-1213 (2007). 

17. Erzberger, J. P., Mott, M. L. & Berger, J. M. Structural basis for ATP-dependent 
DnaA assembly and replication-origin remodeling. Nature Struct. Mol. Biol. 13, 
676-683 (2006). 

18. Duderstadt, K. E. et al. Origin remodeling and opening in bacteria rely on distinct 
assembly states of the DnaA initiator. J. Biol. Chem. 285, 28229-28239 (2010). 

19. Krause, M. & Messer, W. DnaA proteins of Escherichia coli and Bacillus subtilis: 
coordinate actions with single-stranded DNA-binding protein and interspecies 
inhibition during open complex formation at the replication origins. Gene 228, 
123-132 (1999). 

20. Donczew, R., Weigel, C., Lurz, R., Zakrzewska-Czerwinska, J. & Zawilak-Pawlik, A. 
Helicobacter pylori oriC - the first bipartite origin of chromosome replication in 
Gram-negative bacteria. Nucleic Acids Res. 40, 9647-9660 (2012). 

21. Ozaki, S., Fujimitsu, K., Kurumizaka, H. & Katayama, T. The DnaA homolog of 
the hyperthermophilic eubacterium Thermotoga maritima forms an open 
complex with a minimal 149-bp origin region in an ATP-dependent manner. 
Genes Cells 11, 425-438 (2006). 

22. Gao, F., Luo, H. & Zhang, C. T. DoriC 5.0: an updated database of oriC regions in 
both bacterial and archaeal genomes. Nucleic Acids Res. 41, D90-D93 (2013). 

23. Kumar, S., Farhana, A. & Hasnain, S. E. /n-vitro helix opening of M. tuberculosis 
oriC by DnaA occurs at precise location and is inhibited by IciA like protein. 
PLoS ONE 4, e4139 (2009). 

24. Pei, H. et al. Mechanism for the TtDnaA-Tt-oriC cooperative interaction at high 
temperature and duplex opening at an unusual AT-rich region in 
Thermoanaerobacter tengcongensis. Nucleic Acids Res. 35, 3087-3099 (2007). 

25. Kaur, G. et al. Building the bacterial orisome: high-affinity DnaA recognition 
plays a role in setting the conformation of oriC DNA. Mol. Microbiol. 91, 
1148-1163 (2014). 

26. Noguchi, Y., Sakiyama, Y., Kawakami, H. & Katayama, T. The Arg fingers of key 
DnaA protomers are oriented inward within the replication origin oriC and 
stimulate DnaA subcomplexes in the initiation complex. J. Biol. Chem. 290, 
20295-20312 (2015). 

27. Bleichert, F., Botchan, M. R. & Berger, J. M. Crystal structure of the eukaryotic 
origin recognition complex. Nature 519, 321-326 (2015). 

28. Crooks, G. E., Hon, G., Chandonia, J. M. & Brenner, S. E. WebLogo: a sequence 
logo generator. Genome Res. 14, 1188-1190 (2004). 


Supplementary Information is available in the online version of the paper. 


Acknowledgements We thank J. Errington and W. Vollmer for reviewing the 
manuscript. We thank G. Scholefield for preliminary data, A. Koh for research 
assistance and I. Selmes for technical assistance. Research support was 
provided to H.M. by a Royal Society University Research Fellowship and a 
Biotechnology and Biological Sciences Research Council Research Grant 
(BB/KO17527/1), and to O.H. by an Iraqi Ministry of Higher Education and 
Scientific Research Studentship. 


Author Contributions H.M. and T.T.R. conceived and designed experiments; 
H.M., T.T.R. and O.H. constructed plasmids and strains; H.M. and O.H. 
performed growth and marker frequency analysis experiments; H.M. 
performed microscopy experiments; T.T.R. purified proteins, performed the 
open complex assay, and performed the DnaA filament formation assays; 
H.M. and T.T.R. interpreted results and wrote the paper. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare no competing financial 
interests. Readers are welcome to comment on the online version of the paper. 
Correspondence and requests for materials should be addressed to 

H.M. (heath. murray@newcastle.ac.uk). 


00 MONTH 2016 | VOL 000 | NATURE | 5 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized. The investigators were not blinded to allocation during 
experiments and outcome assessment. 

Media and chemicals. Nutrient agar (Oxoid) was used for routine selection and 
maintenance of both B. subtilis and E. coli strains. For experiments in B. subtilis cells 
were grown using Luria-Bertani medium. Supplements were added as required: 
chloramphenicol (51g ml~ 1), erythromycin (1 Lg ml~!), kanamycin (5 Lg ml”), 
spectinomycin (501g ml~!). Unless otherwise stated, all chemicals and reagents 
were obtained from Sigma-Aldrich. 

Phenotype analysis of oriC mutants using the inducible oriC-independent 
strain. Strains were grown for 18-72h at 37°C on nutrient agar plates either with 
or without IPTG (1 mM). All experiments were independently performed at least 
twice and representative data are shown. 

Marker frequency analysis. Genomic DNA was harvested from cells during the 
exponential growth phase and the relative amount of DNA from the replication 
origin (ori) and terminus (ter) was determined by qPCR. Strains were grown in Luria- 
Bertani medium to an absorbance, A¢o0 nm Of 0.3-0.5 whereupon sodium azide 
(0.5%) was added to prevent further metabolism. Chromosomal DNA was iso- 
lated using a DNeasy Blood and Tissue Kit (Qiagen). The DNA replication origin 
(oriC) region was amplified using primers 5’-GAATTCCTTCAGGCCATTGA-3! 
and 5‘-GATTTCTGGCGAATTGGAAG-3’; the region adjacent to oriN 
was amplified using primers 5’‘-CTTTCTGCCGCAAAGGATTA-3’ and 
5'-CCTCTTCATAGCCGTTTTGC-3’; the DNA replication terminus (ter) 
region was amplified using primers 5’-TCCATATCCTCGCTCCTACG-3’ and 
5/-ATTCTGCTGATGTGCAATGG-3’. Either Rotor-Gene SYBR Green (Qiagen) 
or GoTaq (Promega) qPCR mix was used for PCR reactions. GPCR was performed 
in a Rotor-Gene Q Instrument (Qiagen). By use of crossing points (Cr) and PCR 
efficiency a relative quantification analysis (AACr) was performed using Rotor- 
Gene Software version 2.0.2 (Qiagen) to determine the origin:terminus (ori:ter) 
ratio of each sample. These results were normalized to the ori:ter ratio of a DNA 
sample from B. subtilis spores, which only contain one chromosome and thus 
have an ori:ter ratio of 1. Error bars indicate the standard deviation of three tech- 
nical replicates. All experiments were independently performed at least twice and 
representative data are shown. 

Protein expression. BL21 (DE3)-pLysS cells were transformed with the appropri- 
ate expression construct (Supplementary Table 2) and selected on nutrient agar 
plates containing 100 ng jl! of ampicillin and 34 ng jl“! of chloramphenicol. 
A single transformant colony was used to inoculate an overnight starter culture 
grown at 37°C, 180 rpm, in Luria-Bertani medium supplemented with 100 ng jl"! 
of ampicillin and 34 ng jl! of chloramphenicol. The following morning a 1/100 
dilution of overnight culture was used to inoculate 1,200 ml of Luria-Bertani 
medium supplemented with 100 ng jl! of ampicillin and grown at 37°C, 180 rpm, 
to Agoo nm = 0.5. Cells were induced with 1 mM IPTG and cultured for a further 3h 
at 30°C. Cells were pelleted at 3,000g, 4°C for 10 min before resuspension in 45 ml 
of resuspension buffer (25mM HEPES-KOH (pH 7.6); 500 mM potassium gluta- 
mate; 10 mM magnesium acetate; 20% sucrose; 30 mM imidazole; 1 x cOmplete 
EDTA-free protease inhibitor tablet (Roche). The cell suspension was then 
flash-frozen in liquid nitrogen. 

Protein purification. DnaA (WT, WT-CC and I190A-CC) was purified as fol- 
lows. A frozen 50 ml BL21 cell pellet suspension was thawed on ice with 32 mg 
of lysozyme and gentle agitation for 1h then disrupted by sonication at 20 W for 
5 min in 2s pulses. Cell debris was pelleted by centrifugation at 31,000g, 4°C for 
45 min and the supernatant further clarified by filtration (0.45 1m). All subse- 
quent steps were performed at 4°C unless otherwise stated. The clarified lysate 
was applied at 1 ml min”! to a 1 ml HisTrap HP column (GE), which had previ- 
ously been equilibrated with Ni binding buffer (25 mM HEPES-KOH (pH 7.6); 
250mM potassium glutamate; 10 mM magnesium acetate; 20% sucrose; 30 mM 
imidazole). The loaded column was washed with a 10 ml one-step gradient of 10% 
Ni elution buffer (25mM HEPES-KOH (pH 7.6); 250mM potassium glutamate; 
10mM magnesium acetate; 20% sucrose; 30 mM imidazole). Specifically bound 
proteins were eluted using a 7.5 ml one-step gradient of 100% Ni elution buffer and 
the entire fraction collected and diluted into 42.5 ml of Q binding buffer (30 mM 
Tris-HCl (pH 7.6); 100mM potassium glutamate; 10 mM magnesium acetate; 
1mM DTT; 20% sucrose). The diluted fraction was then applied at 1 ml min™! 
to a 1 ml HiTrap Q HP column (GE), which had previously been equilibrated 
with Q binding buffer. The loaded HiTrap Q HP column was washed with 10 ml 
of Q binding buffer then eluted using a linear 10 ml gradient of 0-100% Q elution 
buffer (30 mM Tris-HCl (pH 7.6); 1M potassium glutamate; 10 mM magnesium 
acetate; 1mM DTT; 20% sucrose) with 1 ml fractions collected. The peak 3 x 1 ml 
fractions, based on ultraviolet absorbance, were pooled and dialysed into 1 Ll of 
FactorXa cleavage buffer (25 mM HEPES-KOH (pH 7.6); 250 mM potassium 


glutamate; 20% sucrose; 5mM CaCly), using 3.5k MWCO SnakeSkin dialysis 
tubing (Life Technologies) at 4°C overnight. The dialysed protein was diluted to 
5 ml total volume in FactorXa cleavage buffer and incubated at 23°C for 6h with 
801g of FactorXa protease (NEB). The sample was applied at 1 ml min“! to a 1 ml 
HisTrap HP column (GE), which had previously been equilibrated with Factor Xa 
cleavage buffer. The Factor Xa-cleaved fraction was eluted in 7.5 ml of Ni bind- 
ing buffer. The eluted fraction was diluted into 42.5 ml of Q binding buffer and 
purified on a 1 ml HiTrap Q HP column as previously described. Peak fraction(s) 
were pooled and dialysed into 1 L of final dialysis buffer (40 mM HEPES-KOH 
(pH 7.6); 250mM potassium glutamate; 1 mM DTT; 20% sucrose; 20% PEG309), 
using 3.5k MWCO SnakeSkin dialysis tubing (Life Technologies) at 4°C overnight 
before aliquoting, flash-freezing in liquid nitrogen and storage at —80°C. Removal 
of the amino (N)-terminal His-tag, after incubation with FactorXa, was confirmed 
by anti-pentaHis (Qiagen) western blotting. 

C-terminally His-tagged DnaA (WT-CC and A(domainI-II)-CC) purification 
was performed as for the tag-free variants, except that the protein was dialysed 
into final dialysis buffer after the first HiTrap Q HP column purification before 
aliquoting, flash-freezing and storing. 

HBsu purification was performed exactly as for DnaA, except that the HiTrap 
Q HP column was substituted for a 1 ml HiTrap Heparin HP column (GE) and 
the composition of buffers was modified accordingly. Ni binding buffer (25 mM 
Tris-HCl (pH 8.0); 400 mM NaCl; 30mM imidazole). Ni elution buffer (25 mM Tris- 
HCl (pH 8.0); 400 mM NaCl; 500 mM imidazole). Heparin binding buffer (25 mM 
Tris-HCl (pH 8.0); 100mM NaCl; 1mM EDTA). Heparin elution buffer (25 mM 
Tris-HCl (pH 8.0); 2M NaCl; 1mM EDTA). Factor Xa cleavage buffer (25 mM Tris- 
HCl (pH 8.0); 100 mM NaCl; 2mM CaCl); 20% sucrose). Final dialysis buffer 
(25 mM Tris-HCl (pH 8.0); 400 mM NaCl; 2mM CaCl); 20% sucrose; 20% 
PEG309). Peak fractions were determined by SDS-PAGE and Coomassie staining 
owing to the absence of tryptophan, tyrosine and cysteine residues. 

Open complex formation assays. KMnO, footprinting assays were essentially 
performed as described in ref. 9, except for the following changes. DnaA was not 
pre-incubated with ATP. The unwinding buffer contained 2mM ATP, rather than 
5mM, and 500 ng of plasmid pTR541 (wild type) or pTR542 (t1-t6**) was used per 
75-\l-scale reaction. DnaA was added to final concentrations of 0, 100, 250, 500 
and 1,000nM. Assembled reactions were incubated at 37°C for 10 min. KMnO, 
treatment was then performed at 37°C for 10 min. Six microlitres of 6-mercap- 
toethanol was used to quench reactions; however, EDTA was omitted. KMnO,- 
treated DNA was immediately purified using a Qiagen PCR clean-up kit, eluting 
in 201 of EB buffer. KMnO,-treated templates were not linearized before primer 
extension. Primer extensions were performed on a 20j1l scale using 0.1 U yl! of 
Vent exo- DNA polymerase (NEB) in 1 x manufacturer’s reaction buffer supple- 
mented with 4mM MgSO, 200,1M each dNTP, 200 nM Cy5-labelled oligonucle- 
otide (5’-Cy5-AGCTTCAGCAGCATGTAAAAG-3’) and 4,11 of PCR-purified 
template DNA per reaction. Reactions were subjected to thermocycling using a 
3Prime thermal cycler (Techne) with 1 min initial denaturation at 98 °C, followed 
by 35 cycles of (10s at 98°C; 30s at 55°C; 30s at 72°C). Reactions were quenched 
by addition of an equal volume of stop buffer (95% formamide; 10 mM EDTA; 
10mM NaOH; 0.01% Orange-G) and products subjected to denaturing PAGE 
(6% acrylamide:bisacrylamide (19:1); 8M urea in 1 x TBE). Resolved products 
were visualized using a Typhoon Trio Variable Mode Imager (GE Healthcare). The 
DnaA-trio marker was generated by primer extension performed under the same 
conditions as described for KMnO,-treated substrates, but using a PCR product 
as template generated with a primer corresponding to the end of the first DnaA- 
trio (5/-TAGGGCCTGTGGATTTGTG-3’). All experiments were independently 
performed at least twice and representative data are shown. 

Filament assembly assays (BMOE). DNA scaffolds were prepared by mix- 
ing each oligonucleotide (50 nM final concentrations) in 10 mM HEPES-KOH 
(pH 7.6), 100mM NaCl and 1mM EDTA. Mixed oligonucleotides were heated 
to 98°C for 5 min in a heat-block and slowly cooled to room-temperature in the 
heat-block before use. Filament formation was promoted by mixing DnaA-CC 
proteins (WT, 1190A, AdomainI-II) (200 nM final concentration) with DNA scaf- 
fold (15 nM) on a 20,11 scale in 30 mM HEPES-KOH (pH 7.0), 100mM potassium 
glutamate, 100mM NaCl, 10mM magnesium acetate, 25% glycerol, 0.01% Tween- 
20 and 2mM nucleotide (ADP or ATP). Reactions were incubated at 37°C for 
5-12 min before addition of 4mM BMOE (ThermoFisher Scientific). Reactions 
were incubated at 37 °C for 5-12 min before quenching by addition of 60 mM 
cysteine. Reactions were incubated once more at 37°C for 10-12 min before fix- 
ing in NuPAGE LDS sample buffer (ThermoFisher Scientific) at 98°C for 5 min. 
Complexes were resolved by running 500 fmol of cross-linked DnaA from each 
reaction on a NuPAGE Novex 3-8% Tris-acetate gel (ThermoFisher Scientific) then 
transferred to Hybond 0.45,1m PVDF membrane (Amersham) in 0.5 x NuPAGE 
Tris-acetate SDS running buffer with 20% MeOH at 35 mA, 4°C overnight using 
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wet transfer apparatus (Biorad). Complexes were visualized by western blotting 
using a polyclonal anti-DnaA antibody (Eurogentec). NB: all filament assembly 
assays were performed using tag-free proteins with the exception of that shown 
in Extended Data Fig. 7, in which C-terminally His-tagged proteins (AdomainI- 
II-CC and wild-type-CC) were used. All experiments were independently per- 
formed at least twice and representative data are shown. 
Filament assembly assays (BS*). Filament assembly assays using bis(sulfosuccin- 
imidyl)suberate (BS*) were performed as described for BMOE, except a tag-free 
fully wild-type recombinant DnaA protein was used for Extended Data Fig. 3. 
Tag-free ‘CC’ variants of wild-type and 1190A DnaA were used for Extended Data 
Fig. 5a, b. Crosslinking was performed using BS? (15 mM final concentration) in 
place of BMOE and quenching performed by addition of Tris-HCl (pH 7.6) (30mM 
final concentration). All experiments were independently performed at least twice 
and representative data are shown. 
Microscopy. To visualize GFP-DnaN, starter cultures were grown overnight in 
defined minimal medium base (Spizizen minimal salts supplemented with Fe-NH4- 
citrate (1 jug ml”), MgSO, (6mM), CaClz (100|1M), MnSO, (1301M), ZnCl, 
(11M), thiamine (2|1M)) supplemented with casein hydrolysate (200 1g ml?) 
and glycerol (0.5%) with IPTG (1 mM) at 37°C, diluted 1:100 into fresh medium 
with IPTG (1 mM) and allowed to grow at 37 °C for several generations until they 
reached Agoo nm = 0.3. Cells were collected by centrifugation, washed to remove 
IPTG, and resuspended into fresh medium at Agoo nm = 0.1 and allowed to grow 
until A¢00 nm = 0.6. Cells were mounted on 1.5% agar pads (0.5 x growth media) 
and a 0.13-0.17 mm glass coverslip (VWR) was placed on top. Microscopy was 
performed on an inverted epifluorescence microscope (Nikon Ti) fitted with a 
Plan-Apochromat objective (Nikon DM 100 x /1.40 Oil Ph3). Light was transmit- 
ted from a 300 W xenon arc-lamp through a liquid light guide (Sutter Instruments) 
and images were collected using a CoolSnap HQ2 cooled CCD (charge-coupled 
device) camera (Photometrics). All filters were Modified Magnetron ET Sets from 
Chroma and details are available upon request. Digital images were acquired and 
analysed using METAMORPH software (version 6.2r6). All experiments were 
independently performed at least twice and representative data are shown. 
Strains. Strains are listed in Supplementary Table 1. The genotype of all origin 
mutants was confirmed by DNA sequencing. 
Oligonucleotides. All oligonucleotides were purchased from Eurogentec. 
Oligonucleotides used for plasmid construction are listed in Extended Data 
Table 2. Oligonucleotides used to construct DNA scaffolds are listed in Extended 
Data Table 3. 
Plasmids. Plasmids are listed in the Supplementary Table 2 (sequences are available 
upon request). DH5a (F~ ®80lacZAM15 A(lacZYA-argF) U169 recAl endA1 
hsdR17(t,, mx") phoA supE44 thi-1 gyrA96 relAl \)”? was used for plasmid con- 
struction, except where noted. Descriptions, where necessary, are provided below. 

pHM327 derivatives were generated by quickchange mutagenesis using the 
oligonucleotides listed in Extended Data Table 2. After sequencing to confirm 
mutated regions, sequences were subcloned using BglII/FspAI. 

pHM446 (bla aprE' kan lacl Pyyac- MCS ‘apr E) is a derivative of pAPNC213 (ref. 30) 
with a kanamycin resistance cassette replacing the spectinomycin resistance cas- 
sette (gift from H. Strahl). 

pHM453 (bla rpnA’ rpmH erm AincAB P.pac-dnaA’) was created in multiple 
steps. First, pJS1 was generated by ligation with a HindIII-BamHI PCR product 
containing 5’ end of dnaA and pMUTING4 (ref. 31) cut with HindIII-BamHI (gift 
from J. Errington). Second, pHM396 was generated by digestion of pJS1 with Pvull 
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(to remove lacZ and lacI) and ligation of the vector backbone. Finally, pHM453 
was generated by ligation of an AatII PCR product containing rpmH and the 
5! end of rpnA (0oHM319 + 0HM320 and 168CA genomic DNA as template) with 
pHM396 cut with AatlI. 

pHM492 (bla aprE’ kan lacl Pspac-repN(oriN) ‘aprE) was generated by ligation 
of an EcoRI-Xhol PCR product containing repN(oriN) (0 HM313 + 0HM315 and 
MMB208 (ref. 32) genomic DNA as template) with pHM446 cut with EcoRI-Sall. 

pHM560 (bla rpnA’! rpmH erm AincA Panaa AincB dnaA’) was generated 
by ligation of an EcoRV-HindIII PCR product containing the dnaA promoter 
(0HM510+0HMS511 and 168CA genomic DNA as template) with pHM453 cut 
with EcoRV-HindIlI. 

pTR72, pTR73, pTR74, pTR102, pTR168 were generated by quickchange 
mutagenesis using the oligonucleotides listed in Extended Data Table 2. 

pTR208 was generated by two-fragment PCR. oTR384/oTR385 and oTR386/ 
oTR387 were used to amplify products using pTR74 and B. subtilis 1}68CA genomic 
DNA as templates, respectively. An equal volume of each PCR product was mixed, 
heated to 98°C and allowed to cool to room temperature before DpnI digestion 
and transformation. 

pT R229 (bla P-y7(hise-link-Xa-dnaA)) was generated by subcloning a HindIII- 
Xhol fragment of dnaA from pHM2339 into the pTR74 backbone. 

pTR541 and pTR542 were generated by two-fragment PCR. oTR537 and oT R538 
were used to amplify the plasmid backbone of pSG1301. oTR535 and oT R536 
were used to amplify incC with B. subtilis 1683CA genomic DNA and pTR84 used 
as the templates for pTR541 and pTR542, respectively. An equal volume of each 
PCR product was mixed, heated to 98°C and allowed to cool to room tempera- 
ture before DpnI digestion and transformation into EH3827 (asnB32 relA1 spoT1 
thi-1 fuc-1 lysA ilv-192 zia:pKN500 AdnaA mad-1)**. DNA sequencing confirmed 
the construction of each origin including flanking sequences (>400 base pairs 
upstream and downstream). 
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Extended Data Figure 1 | Structure of DnaA proteins. a, Primary 
domain structure of DnaA. Key functions are listed below the relevant 
domain. b, Structure of Thermatoga maritima DnaA domain III, 
highlighting the single-strand binding residue Val176 (Ile190 B. subtilis) 
within the ISM (PDB accession number 2Z4S). c, Structure of E. coli 
DnaA domain IV bound to a DnaA-box (PDB accession number 1J1V). 


d, Structure of A. aeolicus DnaA domain III (blue shades) and domain IV 
(cyan shades) bound to a single DNA strand (orange), highlighting the 
single-strand binding residue Val156 (Ile190 B. subtilis) (PDB accession 
number 3R8B). e, Scheme used to construct mutants within the B. subtilis 
DNA replication origin. The green arrow highlights the location of a 
DnaA-box mutation. 
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Extended Data Figure 2 | Characterization of the inducible repN/oriN d, Analysis of DNA replication initiation at oriC and oriN. Marker 
replication initiation system. Repression of repN expression inhibits frequency analysis was used to measure the rate of DNA replication 
DNA replication in a AoriC mutant. A large deletion was introduced into initiation in the presence and absence of IPTG (0.1 mM). Genomic DNA 
the B. subtilis replication origin using a strain harbouring the inducible was harvested from cells during the exponential growth phase and the 
oriN/repN construct. Strain growth was found to be dependent upon relative amount of DNA from either the endogenous replication origin 


addition of the inducer IPTG. a, Strains streaked to resolve single colonies. _ (oriC) or the aprE locus (oriN) compared with the terminus (ter) was 

b, A GFP-DnaN reporter was used to detect DNA replication after removal determined using qPCR (mean and s.d. of three technical replicates). Cell 
of IPTG from inducible oriN/repN strains. Scale bar, 5 jum. c, Genetic map doubling times (in minutes) are shown above each data set. 

indicating the location of oriN at the aprE locus in strain HM1108. 
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Extended Data Figure 3 | Wild-type DnaA assembles into filaments 
on 5’-tailed substrates. DnaA filament formation using amine-specific 
crosslinking (BS?) on DNA scaffolds (represented by symbols above each 
lane). Protein complexes were resolved by SDS-PAGE and DnaA was 
detected by western blot analysis. 
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Wild-type sequence 
indicating deletions 


3’ -GATGATAATGAAGAT-5’ 
3’ -GATGATAATGAAGAT-5’ 
3’ -GATGATAATGAAGAT-5’ 
3’ -GATGATAATGAAGAT-5’ 
3’ -GATGATAATGAAGAT-5’ 
3’ -GATGATAATGAAGAT-5’ 
3’ -GATGATAATGAAGAT-5’ 
3’ -GATGATAATGAAGAT-5’ 
3’ -GATGATAATGAAGAT-5’ 
3’ -GATGATAATGAAGAT-5’ 


3’ -GATGATAATGAAGAT-5’ 
3’ -GATGATAATGAAGAT-5’ 
3’ -GATGATAATGAAGAT-5’ 
3’ -GATGATAATGAAGAT-5’ 
3’ -GATGATAATGAAGAT-5’ 
3’ -GATGATAATGAAGAT-5’ 
3’ -GATGATAATGAAGAT-5’ 
3’ -GATGATAATGAAGAT-5’ 
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Resulting sequence 


3’ -GATGATAATGAAGAT-5/ 
3’ -ATGATAATGAAGAT-5' 
3’ -GTGATAATGAAGAT-5’ 
3’ -GAGATAATGAAGAT-5’ 
3’ -GATATAATGAAGAT-5’ 
3’ -GATGTAATGAAGAT-5/ 


2 ee eee 
_3’-GATGATATGAAGAT-5’) 
3! <GATGATATGAAGAT—5/_ 


3’ -GATGATAAGAAGAT-5’ 


3’ -GATGATAATGAAGAT-5’ 


3’ -GATAATGAAGAT-5’, 
3’ -GATAATGAAGAT-5’, 
3’ -GATAATGAAGAT-5’, 
_3' -GATAATGAAGAT-5/, 
3’ -GATGATGAAGAT-5’, 
|3’-GATGATGAAGAT-5’ 


Extended Data Figure 4 | DNA sequence of unwinding regions after mononucleotide and trinucleotide deletions. Resulting sequences grouped in 
boxes are identical for more than one deletion. 
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Extended Data Figure 5 | Crosslinking with BS? captures a distinct 
DnaA oligomer. DnaA was incubated with various DNA scaffolds 

and different crosslinking agents were added to capture distinct DnaA 
oligomers. a, Crosslinking with BMOE detects DnaA oligomers forming 
on both duplex and tailed substrates. b, Crosslinking with BS? only detects 
DnaA oligomers forming on tailed substrates, revealing an interaction 
between DnaA and the first DnaA-trio motif located downstream of the 
GC-cluster. 
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Extended Data Figure 6 | The nucleotide at the third position of the 
DnaA-trio is required to stabilize DnaA. DNA scaffolds containing the 
first two nucleotides of a DnaA-trio either with or without a 5'-phosphate 
are unable to stabilize binding of an additional DnaA protomer, indicating 
that the nucleotide at the third position is required. Combined with the 
data shown in Fig. 4b where the position is abasic, the results suggest that 
the sugar at the third position plays a critical role in DnaA binding. 
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Extended Data Figure 7 | Relationship between the DnaA-box and region (red or pink rectangles) and the AT-rich region (blue rectangle). 
the DnaA-trios. a, Sequence of the origin region used for constructing b, Loading of the DnaA filament onto a single-stranded 5/-tail requires 
DNA scaffolds. Symbols below represent duplex DnaA-boxes (triangles), a DnaA-box and DnaA domains III-IV, but the DnaA-box position and 
the GC-rich region (green rectangles), the two strands of the unwinding orientation are flexible. 
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Extended Data Table 1 | Bacterial replication origin regions in Fig. 4c 


Organism Genome Reference or DoriC Genome position Tandem 
accession # | accession # (ORI) shown in Figure 4 DnaA-boxes 
(spacing) 
Aquifex aeolicus VF5 NC_000918 | ”” 166853 to 166897 N 
Bacillus subtilis 168 NC_000964 | ™ 1860 to 1902 Y (-1) 
Bdellovibrio bacteriovorus HD100 NC_005363 | ORI10040030 1569 to 1597 N 
Bifidobacterium bifidum PRL2010 NC_014638 | ORI94010761 2048 to 2071 Y (0) 
Bordetella pertussis Tohama | NC_002929 | ORI10030012 4084583 to 4084611 | Y (+3) 
Borrelia afzelii HLJO1 NC_018887 | ORI96010684 460118 to 460149 Y? (+1)(+9) 
Clostridium botulinum A str. Hall NC_009698 | ORI92010335 1517 to 1552 Y (-1) 
Corynebacterium glutamicum NC_003450 | ORI10010055 1984 to 2010 Y (0) 
ATCC 13032 
Enterococcus faecalis V583 NC_004668 | ORI10010096 1498 to 1533 Y (0) 
Escherichia coli MG1655 NC_000913 | * 3925780 to 3925809 | N 
Helicobacter pylori 26695 NC_000915 | * 1607488 to 1607525 | Y (0) 
Leuconostoc citreum KM20 NC_010471 ORI92310382 1790 to 1819 Y (0) 
Listeria monocytogenes EGD-e NC_003210 | ORI10010047 1773 to 1802 Y (-1) 
Oceanobacillus iheyensis HTE831 NC_004193 | ORI10010074 1746 to 1778 Y (-1) 
Staphylococcus aureus NCTC 8325 NC_007795 | ORI10010183 2075 to 2104 Y (-1) 
Streptococcus pneumoniae R6 NC_003098 | ORI10010044 1458 to 1487 Y (0) 
Streptomyces coelicolor A3(2) NC_003888 | * 4270070 to 4270096 | Y (0) 
Synechococcus elongatus PCC 7942 NC_007604 | * 2695870 to 2695899 | Y (0) 
Thermotoga maritima MSB8 NC_000853 | *' 157010 to 157040 N 
Treponema pallidum Nichols NC_000919 | ORI10010003 1568 to 1588 Y (-1) 


References 17, 20, 21 and 34-37 are cited in the table. 
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Extended Data Table 2 | Oligonucleotides used for plasmid construction 
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Product Template Primer #1 Sequence (5'+3') Primer #2 | Sequence (5'»3') 

pHM453_ Genomic DNA oHM319_ — AATAATGACGTCGGCAAATTGTTTGAATTTGTC oHM320 _ AATAATAGACGTCAGCCCGACACGCAGTTCATC 

pHM492_ Genomic DNA_oHM313__ AATAATGAATTCTTAATTATCTAACCAATTATAAAACGGCAG 0HM315__ AATAATCTCGAGCGCTTGGCAGCACCTGAGCAAACC 

pHM560 Genomic DNA oHM510 —_ AATAATGATATCTATAATGGTACCTATATAAGGCCTAGATTGTGACAACCATTG oHM511___ AATATAAAGCTTAGAGGAAAGGTAGGATTAG 

pTR54 ——_ pHM327 oTR36 CTTCTACCATTATCCGTTAGGAGGATAAAAATGAAATTCAC oTR37 GGATAATGGTAGAAGTAATAGTAGGGCCTGTGGATTTG 

pTR72 _ pGS43 oTR156  CAGCTTAAATGAGATCCGGCTGCTAACAAAGCCCGAAAGG oTR157_- CGGATCTCATTTAAGCTGTTCTTTAATTTCTTTTACATGC 

pTR73— pTR72, oTR158 —_ CATCATCATCATCATCACAGCGAAAATATATTAGACCTGTGGAACCAAGCCCTTGCTCAAATC oTR159 _ GCTGTGATGATGATGATGATGCATGGTATATCTCCTTCTTAAAGTTAAACAAAATTATTTC 
pTR74 _ pTR102 oTR189 _ ATTCAAGGTCGCATGGAAAATATATTAGACCTGTGGAACCAAGCCCTTG oTR190 _- TTTCCATGCGACCTTGAATGCCGCTGCTGTGATGATGATGATGATGCATG 
pTR83_—_ pHM327 oTR82 CACAGTCTTCCTTGCTGTGGATAGGCTGTGTTTCCTGTCTITTTC oTR83 ATCCACAGCAAGGAAGACTGTGTATGACTTCCGAAAAGTTATTC 
pTR84_—— pHM328 oTR66 GCCCATGATAATGAAGATGATATTTTTATAAATATATATAT TAATACATTATCCGTTAG oTR67 AAAATATCATCTTCATTATCATGGGCCTGTGGATTTGTGGATAAGTTG 
pTR85 —_ pHM327 oTR118  GAAAGGCAAGGAAGCTTTTCGGAAGTCATACACAGTCTGTC oTR119_ AAAGCTTCCTTGCCTTTCCCCGATTGATCCCCGGTCCTG 

pTR86 — pHM327 oTR123. _ GAAGCTTCCTTGCGTCTGTCCACATGTGGATAGGCTGTGTTTCC. oTR124. — AGACGCAAGGAAGCTTCCGAAAAGTTATTCACACTTTCCCCGATTG 
pTR87___ pHM327 oTR125 _ ACAGCAAGGAAGGCTGTGTTTCCTGTCTTTTTCACAACTTATC oTR126 _ CAGCCTTCCTTGCTGTGGACAGACTGTGTATGACTTCCGAAAAG 
pTR88 —_ pHM327 oTR86 CCTGTCCTTCCTTGCACTTATCCACAAATCCACAGGCCCTACTATTAC oTR87 GGATAAGTGCAAGGAAGGACAGGAAACACAGCCTATCCAC 

pTR89 —_ pHM327 oTR88 TCACAACCTTCCTTGCAATCCACAGGCCCTACTATTACTTCTAG oTR89 GTGGATTGCAAGGAAGGTTGTGAAAAAGACAGGAAACACAGCCTATC 
pTR9O —_ pHM327 oTROQO — CCACATTCCTTGCGGCCCTACTATTACTTCTAG oTRO91 GGGCCGCAAGGAATGTGGATAAGTTGTGAAAAAGACAGGAAAG 
pTR102 pTR73. oTR187_ _ CAGCAGCGGCATTCGAAAATATATTAGACCTGTGGAACCAAGCCCTTG oTR188_ _ TTTCGAATGCCGCTGCTGTGATGATGATGATGATGCATGGTATATC 
pTR146 pHM327 oTR247__ CAGGCCCATGTATTACTTCTACTATTTTTTATAAATATATATATTAATAC. oTR248 _ AATACATGGGCCTGTGGATTTGTGGATAAGTTGTG 

pTR147 pHM327 oTR249_ GCCCTACATATACTTCTACTATTTTTTATAAATATATATAT TAATACATTATC. oTR250 _ AGTATATGTAGGGCCTGTGGATTTGTGGATAAGTTGTG 

pTR148— pHM327 oTR251  CCCTACTATATGTTCTACTATTTTTTATAAATATATATATTAATACATTATCCGTTAG oTR252__ GAACATATAGTAGGGCCTGTGGATTTGTGGATAAGTTGTG 

pTR149_ pHM327 oTR254_ _ AGTACTTGTAATAGTAGGGCCTGTGGATTTGTGGATAAG oTR263._ CTACTATTACAAGTACTATTTTTTATAAATATATATATTAATACATTATCCGTTAGGAG 
pTR150 — pHM327 oTR255 — TACTTCATGTATTTTTTATAAATATATATATTAATACATTATCCGTTAGGAG oTR258 —- AATACATGAAGTAATAGTAGGGCCTGTGGATTTGTG 

pTR153_ pHM327 oTR261 _ GATCAATCGGTATCCGTTAGGAGGATAAAAATGAAATTC oTR262__ CTAACGGATACCGATTGATCCCCGGTCCTGCTATTTAAG 

pTR168 — pTR74 oTR299__ GAATTCGCCTGCTCTATCCGAGATAATAAATGC oTR300  GAGCAGGCGAATTCGTTTGTAAATTTCTCAGAAG 

pTR208 — pTR74 oTR384.___ TCAAGGTCGCATGAACAAAACAGAACTTATCAATG oTR385__- CGGATCTTATTTTCCGGCAACTGCGTCTTTAAGC 

pTR208 GenomicDNA oTR386 —_ TTTTGTTCATGCGACCTTGAATGCCGCTGCTG oTR387__ CGGAAAATAAGATCCGGCTGCTAACAAAGCCCGAAAG 

pTR284 — pHM327 oTR449_  GGCCCAACTATTACTTCTACTATTTTTTATAAATATATATATTAATACATTATC oTR450  GTAATAGTTGGGCCTGTGGATTTGTGGATAAG 

pTR285 — pHM327 oTR451 — GGCCCTTCTATTACTTCTACTATTTTTTATAAATATATATATTAATACATTATC oTR452__ GTAATAGAAGGGCCTGTGGATTTGTGGATAAG 

pTR286  pHM327 oTR453_ _ GGCCCTAGTATTACTTCTACTATTTTTTATAAATATATATAT TAATACATTATCCG oTR454 —_ GTAATACTAGGGCCTGTGGATTTGTGGATAAG 

pTR287 _ pHM327 oTR455 — CCCTACAATTACTTCTACTATTTTTTATAAATATATATATTAATACATTATCCGTTAG oTR456 _ AGTAATTGTAGGGCCTGTGGATTTGTGG 

pTR288 — pHM327 oTR457 — CCCTACTTTTACTTCTACTATTTTTTATAAATATATATATTAATACATTATCCGTTAG oTR458  GAAGTAAAAGTAGGGCCTGTGGATTTGTGGATAAG 

pTR289 — pHM327 oTR459 — CCCTACTAATACTTCTACTATTTTTTATAAATATATATATTAATACATTATCCGTTAG oTR460 —_- GTATTAGTAGGGCCTGTGGATTTGTGGATAAG 

pTR301__ pHM327 oTR469_ —_ CTTCTACTTTTTATAAATATATATATTAATACATTATCCGTTAGGAGGATAAAAATG oTR470 _- TATAAAAAGTAGAAGTAATAGTAGGGCCTGTGG 

pTR302__ pHM327 oTR471_——_ TTACTTCTTTTTATAAATATATATATTAATACATTATCCGTTAGGAGGATAAAAATG oTR472__ TATAAAAAGAAGTAATAGTAGGGCCTGTGGATTTG 

pTR303__ pHM327 oTR473.___ CTATTACTTTTTATAAATATATATATTAATACATTATCCGTTAGGAGGATAAAAATG oTR474___- TATAAAAAGTAATAGTAGGGCCTGTGGATTTG 

pTR304 pHM327 oTR475 — CAGGCCCTACTATTTTTTATAAATATATATATTAATACATTATCCGTTAGGAG oTR476 - AATAGTAGGGCCTGTGGATTTGTGGATAAG 

pTR305 — pHM327 oTR477_- GCCCTACTTTTTATAAATATATATATTAATACATTATCCGTTAGGAGGATAAAAATG oTR478 _ TTATAAAAAGTAGGGCCTGTGGATTTGTG. 

pTR306— pHM327 oTR479_- GGCCCTTTTTATAAATATATATATTAATACATTATCCGTTAGGAGGATAAAAATG. oTR480 —_- TATAAAAAGGGCCTGTGGATTTGTGG 

pTR346 pHM327 oTR490 — GGCCCACTATTACTTCTACTATTTTTTATAAATATATATATTAATAC oTR491 GAAGTAATAGTGGGCCTGTGGATTTGTGGATAAGTTG 

pTR348— pHM327 oTR494. - GGCCCTATTACTTCTACTATTTTTTATAAATATATATATTAATACATTATC oTR495  AGAAGTAATAGGGCCTGTGGATTTGTGGATAAGTTG 

pTR349_— pTR54 oTR496 — ATTACTTCCATTATCCGTTAGGAGGATAAAAATGAAATTC. oTR497__ GATAATGGAAGTAATAGTAGGGCCTGTGGATTTG 

pTR350 — pTR54 oTR498. — ACTATTACCATTATCCGTTAGGAGGATAAAAATG oTR499_- GGATAATGGTAATAGTAGGGCCTGTGGATTTGTG 

pTR377__ pHM327 oTR539_— GGCCCTATATTACTTCTACTATTTTTTATAAATATATATATTAATACATTATC. oTR540 —- GTAATATAGGGCCTGTGGATTTGTG 

pTR441 — pHM327 oTR659 = CCCTACATTACTTCTACTATTTTTTATAAATATATATATTAATACATTATC oTR660 _ GTAGAAGTAATGTAGGGCCTGTGGATTTGTG 

pTR443 pHM327 oTR663_—_ CCCTACTACTTCTACTATTTTTTATAAATATATATATTAATACATTATC oTR664_ __ GTAGAAGTAGTAGGGCCTGTGGATTTGTG 

pTR452 — pHM326 oTR698._ — GGCCGTACTATTACTTCTACTATTTTTTATAAATATATATATTAATAG oTR699_ _- GTAATAGTACGGCCTGTGGATTTGTGG 

pTR453 pHM327 oTR700 — AGGCCTACTATTACTTCTACTATTTTTTATAAATATATATATTAATAC, oTR701 GTAATAGTAGGCCTGTGGATTTGTGG 

pTR456 pHM327 oTR706_ = GCCCTCTATTACTTCTACTATTTTTTATAAATATATATATTAATACATTATC oTR707___- AAGTAATAGAGGGCCTGTGGATTTGTG 

pTR457_ pHM327 oTR708_ — CCCTACTTTACTTCTACTATTTTTTATAAATATATATATTAATACATTATCC oTR709___ GAAGTAAAGTAGGGCCTGTGGATTTG 

pTR458 — pHM327 oTR710 — CCTACTATACTTCTACTATTTTTTATAAATATATATATTAATACATTATCC. oTR711 TAGAAGTATAGTAGGGCCTGTGGATTTG 

pTR478 — pHM327 oTR801.— TACTATTCTTCTACTATTTTTTATAAATATATATATTAATACATTATCCGTTAG oTR802_ _ GTAGAAGAATAGTAGGGCCTGTGGATTTGTG 

pTR541 Genomic DNA oTR535 — ATTACGCCAGCTAGTGCTTTTATTTCTTGCAACCATAATAG oTR536—- ATTAATGCAGTTTTATCCTCCTAACGGATAATG 

pTR541 — pSG1301 oTR537_  AAAGCACTAGCTGGCGTAATAGCGAAGAGG oTR538_ - GAGGATAAAACTGCATTAATGAATCGGCCAACG 

pTR542 — pTR84 oTR535 —_ ATTACGCCAGCTAGTGCTTTTATTTCTTGCAACCATAATAG oTR536_ ATTAATGCAGTTTTATCCTCCTAACGGATAATG 
pTR542__pSG1301 oTR537__ AAAGCACTAGCTGGCGTAATAGCGAAGAGG. oTR538__ GAGGATAAAACTGCATTAATGAATCGGCCAACG. 
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Extended Data Table 3 | Oligonucleotides used to assemble DNA scaffolds 


Oligo 1 Sequence (5’3') Oligo 2. Sequence (5'—3') Figure 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR603 GGGCCTGTGGATTTGTGGATAAGT 2c, 2f, 4a, E5a, ESb 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR609 ATATATATT TATAAAAAATAG TAGAAG TAATAGTAGGGCCTGTGGATTTGTGGATAAGT 2c, E3, E7b 
oTR602 ACTTATCCACAAATCCACAGGCCC, oTR619 ATATATATT TATAAAAATATCATCTTCATTATCATGGGCCTGTGGATT TGTGGATAAGT 2c 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR626 AGGGCCTGTGGATTTGTGGATAAGT 4a 
oTR602__ ACTTATCCACAAATCCACAGGCCG oTR627 _ TAGGGCCTGTGGATTTGTGGATAAGT 4a, E5a, E5b 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR628 GTAGGGCCTGTGGATTTGTGGATAAGT 4a 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR629 AGTAGGGCCTGTGGATTTGTGGATAAGT 4a, E6 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR630 TAGTAGGGCCTGTGGATTTGTGGATAAGT 4a, 4b, E5a, E5b, E6 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR631 ATAGTAGGGCCTGTGGATTTGTGGATAAGT da 
oTR602 ACTTATCCACAAATCCACAGGCCC, oTR632 AATAGTAGGGCCTGTGGATTTGTGGATAAGT 4a 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR633 TAATAGTAGGGCCTGTGGATTTGTGGATAAGT 4a, 2f 
oTR602__ ACTTATCCACAAATCCACAGGCCC oTR634. _ GTAATAGTAGGGCCTGTGGATTTGTGGATAAGT 4a 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR645 AGTAATAGTAGGGCCTGTGGATTTGTGGATAAGT 4a 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR646 AAGTAATAGTAGGGCCTGTGGATTTGTGGATAAGT 4a, 3d 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR647 GAAGTAATAGTAGGGCCTGTGGATTTGTGGATAAGT 4a 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR648 AGAAGTAATAGTAGGGCCTGTGGATTTGTGGATAAGT 4a, 3d 
oTR602 ACTTATCCACAAATCCACAGGCCC, oTR649 TAGAAGTAATAGTAGGGCCTGTGGATTTGTGGATAAGT 4a 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR754 TATAAAAAATAG TAGAAGTAATAGTAGGGCCTGTGGATTTGTGGATAAGT 2f 
oTR602 ACTTATCCACAAATCCACAGGCCC, oTR831 TAGTAGAAGTAATAGTAGGGCCTGTGGATTTGTGGATAAGT 2f 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR854 TAGTAGTAGTAGTAGAAGTAATAGTAGGGCCTGTGGATTTGTGGATAAGT 2f 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR956 AAGAAGTAATATAGGGCCTGTGGATTTGTGGATAAGT 3d 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR957 AAGAAGTAATGTAGGGCCTGTGGATTTGTGGATAAGT 3d 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR958 AAGAAGTAAAGTAGGGCCTGTGGATTTGTGGATAAGT 3d 
oTR602 ACTTATCCACAAATCCACAGGCCC, oTR959 AAGAAGTATAGTAGGGCCTGTGGATTTGTGGATAAGT 3d 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR960 AAGAAGAATAGTAGGGCCTGTGGATTTGTGGATAAGT 3d 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR961 AAGAAGTAATAGGGCCTGTGGATTTGTGGATAAGT 3d 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR962 AAGAAGTAGTAGGGCCTGTGGATTTGTGGATAAGT 3d 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR965 pAGTAGGGCCTGTGGATTTGTGGATAAGT E6 
oTR602 ACTTATCCACAAATCCACAGGCCC, oTR970 AAGTAGGGCCTGTGGATTTGTGGATAAGT 4b 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR971 TAGTAGGGCCTGTGGATTTGTGGATAAGT 4b 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR972 TASTAGGGCCTGTGGATTTGTGGATAAGT 4b 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR995 TAATAGGGCCTGTGGATTTGTGGATAAGT 4b 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR996 TACTAGGGCCTGTGGATTTGTGGATAAGT 4b 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR998 TGGTAGGGCCTGTGGATTTGTGGATAAGT 4b 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR999 TTGTAGGGCCTGTGGATTTGTGGATAAGT 4b 
oTR602 ACTTATCCACAAATCCACAGGCCC, oTR1001 CAGTAGGGCCTGTGGATTTGTGGATAAGT 4b 
oTR602 ACTTATCCACAAATCCACAGGCCC oTR1002 AAGTAGGGCCTGTGGATTTGTGGATAAGT 4b 
oTR603 GGGCCTGTGGATTTGTGGATAAGT oTR608 ACTTATCCACAAATCCACAGGCCCTACTATTACTTCTACTATTTTTTATAAATATATAT 2c, E3 
oTR608 ACTTATCCACAAATCCACAGGCCCTACTATTACTTCTACTATTTTTTATAAATATATAT oTR609 ATATATATT TATAAAAAATAG TAGAAGTAATAGTAGGGCCTGTGGATTTGTGGATAAGT 2c 
oTR611 ACCTTCCTTGCTTCCTTGCGGCCC. oTR612 ATATATATT TATAAAAAATAG TAGAAG TAATAGTAGGGCCGCAAGGAAGCAAGGAAGGT 2c, E7b 
oTR665 ATATATATTTATAAAAAATAG TAGAAGTAATAGTAGGGCCTGTGGATTGCAAGGAAGGT oTR667 ACCTTCCTTGCAATCCACAGGCCC E7b 
oTR666 ATATATATTTATAAAAAATAGTAGAAG TAATAG TAGGGCCGCAAGGAATGTGGATAAGT oTR668 ACTTATCCACATTCCTTGCGGCCC E7b 
oTR946 ACTGTGGATTTGTGGATAAGGCCC oTR623 ATATATATT TATAAAAAATAG TAGAAG TAATAGTAGGGCCTTATCCACAAATCCACAGT E7b 
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Structural basis for amino acid export by DMT 
superfamily transporter YddG 


Hirotoshi Tsuchiya!, Shintaro Doki!, Mizuki Takemoto!, Tatsuya Ikuta!, Takashi Higuchi!, Keita Fukui’, Yoshihiro Usuda’, 
Eri Tabuchi*, Satoru Nagatoishi*, Kouhei Tsumoto*, Tomohiro Nishizawa!, Koichi Ito®, Naoshi Dohmae®, Ryuichiro Ishitani!’ & 


Osamu Nureki! 


The drug/metabolite transporter (DMT) superfamily is a large 
group of membrane transporters ubiquitously found in eukaryotes, 
bacteria and archaea, and includes exporters for a remarkably wide 
range of substrates, such as toxic compounds and metabolites’. 
YddG is a bacterial DMT protein that expels aromatic amino acids 
and exogenous toxic compounds, thereby contributing to cellular 
homeostasis’. Here we present structural and functional analyses 
of YddG. Using liposome-based analyses, we show that Escherichia 
coli and Starkeya novella YddG export various amino acids. The 
crystal structure of S. novella YddG at 2.4 A resolution reveals a new 
membrane transporter topology, with ten transmembrane segments 
in an outward-facing state. The overall structure is basket-shaped, 
with a large substrate-binding cavity at the centre of the molecule, and 
is composed of inverted structural repeats related by two-fold pseudo- 
symmetry. On the basis of this intramolecular symmetry, we propose 
a structural model for the inward-facing state and a mechanism of the 
conformational change for substrate transport, which we confirmed 
by biochemical analyses. These findings provide a structural basis for 
the mechanism of transport of DMT superfamily proteins. 

An important physiological function is the expulsion of various com- 
pounds from cells to the extracellular space, which is essential for cellular 
homeostasis. This process involves specific membrane transporters 
that export their substrates across the cellular membrane. An example 
is the exporters of toxic compounds, which are crucial for the growth 
of microorganisms in the presence of antibiotics and antiseptics*”. 
These drug exporters cause the emergence of multi-drug-resistant 
strains, which are a major obstacle to the effective treatment of bacterial 
infections®’. Furthermore, the exporters of metabolites, such as amino 
acids and sugars, are important for maintaining their appropriate 
concentrations in the cytosol®"!°. These metabolite transporters have 
crucial roles in multicellular organisms to direct metabolites to their 
appropriate locations, including tissues and cellular compartments. 

The DMT superfamily is a large group of membrane transporters, 
comprising more than 32 families’. Numerous members of this 
superfamily are involved in the export of a wide range of substrates, 
including drugs and metabolites, and DMT proteins are ubiquitously 
distributed in eukaryotes, bacteria and archaea. For example, nucleotide 
sugar transporter family proteins export nucleotide-sugar conjugates 
(such as UDP-galactose and CMP-sialate) to the Golgi apparatus and 
endoplasmic reticulum of eukaryotic cells to supply building blocks for 
the sugar chains of glycoproteins, glycolipids and polysaccharides'". 
Many DMT proteins are predicted to contain ten transmembrane 
segments with a five-transmembrane unit internal repeat, which was 
probably formed by gene duplication’!*-'°. Despite the importance of 
the DMT superfamily proteins, their structural mechanism of drug/ 
metabolite transport has remained elusive. 


YddG, a bacterial inner-membrane protein belonging to the DMT 
superfamily, exports drugs and metabolites. YddG from Salmonella 
enterica sv. Typhimurium is involved in the efflux of the di-cationic 
herbicide methyl viologen, and is postulated to be important for the 
efflux of multiple toxic compounds’. E. coli YddG (EcYddG) exports 
aromatic amino acids, and is essential for alleviating the growth inhibi- 
tion caused by their excessive cytosolic accumulation?. To explore the 
substrate specificity of YddG further, we performed an in vitro func- 
tional analysis using YddG-reconstituted proteoliposomes (Fig. 1a). 
The results showed that EcYddG transports various amino acids, 
including threonine, methionine, lysine and glutamic acid (Fig. 1b), 
suggesting the broad substrate specificity of EcYddG. Although we 
could not examine the transport activity of aromatic amino acids in 
the in vitro assay system, because of their low solubility, YddG probably 
transports aromatic amino acids as well, based on a previous genetic 
analysis*. We also confirmed the in vivo amino acid export activity of 
EcYddG by a metabolomics analysis (Extended Data Fig. 1). We iden- 
tified YddG from S. novella (SnYddG, 287 amino acids, 28% sequence 
identity with EcYddG) as a suitable candidate for structural studies, by 
the fluorescence-based screening method'*””. The transport activity of 
SnYddG was confirmed using the in vitro assay system (Fig. 1), which 
showed that SnYddG is also an amino acid transporter with broad 
substrate specificity. 

We determined the crystal structure of SnYddG by the single- 
isomorphous replacement with anomalous scattering method, using 
Hg-derivatized crystals, and refined it at 2.4 A resolution (Extended 
Data Table 1, Extended Data Fig. 2a, b). The asymmetric unit of the crys- 
talline lattice contains six SnYddG molecules with nearly identical struc- 
tures, and they are superimposable with root mean square deviations 
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Figure 1 | Functional characterization of YddG. a, b, Liposome-based 
analyses of EcYddG and SnYddG transport activities. a, Time-dependent 
uptake of ['4C]threonine into empty, EcCYddG- and SnYddG-containing 
liposomes. b, Uptake of C-labelled amino acids into empty, EcYddG- and 
SnYddG-containing liposomes, after 30 min. c.p.m., counts per minute. 
Error bars, s.d.; n= 3. **P < 0.01; ***P < 0.001 compared with empty 
liposomes (Student's t-test). 
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Figure 2 | Overall structure of S. novella YddG. a, Overall structure in 
cylinder representations, viewed from the plane of the membrane (left) and 
the periplasmic side (right). b, Topology diagram of SnYddG, coloured as in a. 
The primary structural organization of the transmembrane segments and 
the N and C halves is illustrated at the bottom. c, Structural superimposition 
of the N and C halves of SnYddG, viewed from two different directions. 

In the right panel, the transmembrane segments that do not overlap in 

the superimposition are indicated by arrows. All molecular graphics were 
illustrated with CueMol (http://www.cuemol.org/). 


(r.m.s.d.) of less than 0.9 A. Thus, we hereafter focus on molecule B 
in the asymmetric unit, as the quality of its electron density is the best 
among these molecules. The overall structure of YddG is basket-shaped, 
with a deep cavity facing the extracellular solvent (Fig. 2a, Extended 
Data Fig. 2c). As expected from previous informatics and biochemical 
analyses!§, YddG comprises 10 a-helical transmembrane segments, 
with its N and C termini located on the intracellular side. The topology 
is composed of four pairs of two consecutive transmembrane segments 
forming two-helix hairpins; that is, transmembrane (TM) 1-TM2, 
TM3-TM4, TM6-TM7 and TM8-TM9, which are arranged alternately 
to surround the central cavity (Fig. 2b). Namely, the transmembrane 
segments in the N-half (TM1-TMS) and the C-half (TM6-TM10) 
surround the central cavity in anticlockwise and clockwise manners, 
respectively, as viewed from the periplasmic side (Fig. 2a, b). TM5 and 
TM10 form a four-helix bundle together with TM4 and TM9, which 
seals one side of the central cavity. TM4 and TM9 are respectively 
interrupted by short loops, with sequences that are well conserved 
among the YddG proteins from other species (Extended Data Fig. 3a), 
and thereby form the short helical segments, TM4a, TM4b, TM9a and 
TMO9b. The N and C halves of SnYddG share weak sequence similarity 
(Extended Data Fig. 3b). Accordingly, the structures of these two halves 
are related by two-fold pseudo-symmetry with an axis running parallel 
to the membrane, and superimpose well with an r.m.s.d. of 2.7 A for 
90 Ca atoms (Fig. 2c). Notably, the resulting topology of YddG is 
unique and completely different from those of membrane transporters 
with known structures. 
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Previous bioinformatics analyses suggested that the small multidrug 
resistance (SMR) family is the progenitor of the DMT proteins’!*-"°. 
E. coli EmrE is the best-characterized member of the SMR family!?”°. 
The crystal structure of EmrE at 3.8 A resolution revealed its dimeric 
architecture, with four-transmembrane segment protomers”. Although 
there is no detectable sequence similarity and their transmembrane 
topologies are different, the superimposition of the transmembrane 
helices of YddG and the EmrE dimer (PDB accession 3B5D) revealed good 
structural alignment (r.m.s.d. of 2.9 A over 127 Ca atoms) (Extended 
Data Fig. 4a—c). The superimposition suggests the possible evolutional 
relationship between the four-transmembrane SMR and other ten- 
transmembrane DMT proteins (see Supplementary Information). 

The central cavity of YddG deeply penetrates the inner leaflet of the 
membrane, and is formed by six transmembrane segments: TM1, TM3, 
TM4, TM6, TM8 and TM9 (Fig. 3a). Notably, TM1, TM4, TM6 and 
TM9 contain several residues conserved among YddGs from other 
species (Extended Data Fig. 3a). At the centre of the molecule, the 
strictly conserved Trp residues, Trp17 (TM1), Trp101 (TM4a), and 
Trp163 (TM6), form the bottom of the cavity (Fig. 3a). The wall of 
the central cavity is created by the conserved hydrophobic residues, 
Leu20 (TM1), Phe40 (TM2), and Phe225 (TM8) (Fig. 3a). Notably, a 
large density blob is observed in the central cavity (Fig. 3a). The shape 
of this density fits well with the monoolein molecule used in the LCP 
crystallization, suggesting that monoolein is bound to this site. This 
density peak interacts with the conserved residues, including Trp17, 
Tyr78, Trp101 and Trp163. Thus, we proposed that this cavity functions 
as a substrate-binding pocket, where these conserved hydrophobic res- 
idues bind the hydrophobic groups of the substrates. Moreover, Tyr78 
(TM3), Tyr82 (TM3), and Tyr99(TM4) are located in the central cavity 
(Fig. 3a), and may provide both hydrophobic and hydrophilic environ- 
ments for substrate binding. In addition, several hydrophilic residues, 
including His79 (TM3), Ser244 (TM9) and Ser251 (TMS), are also 
present in the central cavity (Fig. 3a), and may provide binding sites for 
the hydrophilic groups of the substrates. 

To explore the functional importance of these conserved residues 
for the substrate recognition and transport activity, we measured the 
transport activities of SnYddG mutants by a liposome-based assay, 
using '4C-labelled threonine and methionine (Fig. 3b, c). The structural 
integrity of the mutant proteins was verified by gel-filtration chroma- 
tography at the final purification step (Extended Data Fig. 5a). The 
activity was normalized to the amount of protein reconstituted into the 
proteoliposomes (Extended Data Fig. 5b). The results showed that the 
His79Ala mutant abolished the transport activities for both threonine 
and methionine, thus revealing the critical role of this hydrophilic res- 
idue. Furthermore, the Trp101Ala and Trp163Ala mutants exhibited 
decreased transport activities for threonine, but not for methionine, 
suggesting the importance of these aromatic residues for specific types 
of substrates. In contrast, the Tyr78Ala mutation showed moderate 
effects on the transport activities of both threonine and methionine, 
suggesting that Tyr78 is involved in, but not crucial for, the recognition 
and/or transport of these substrates. The Tyr82 Ala mutation enhanced 
methionine transport, and slightly reduced threonine transport. This 
mutation could increase the size of the substrate-binding site in the 
inward-facing state, which may facilitate the transport of large sub- 
strates. Taken together, our results strongly suggest that the central 
cavity functions as the binding site for a wide range of YddG substrates. 

The intracellular side of the central cavity is closed by the intracel- 
lular gate, which is formed by the interactions among the side-by-side 
helices of TM4b and TM9a, and the intracellular tips of the TM6-TM7 
and TM8-TM9a hairpins (Fig. 2a). Just beneath the central cavity, the 
conserved Trp228 (TM8) and Met232 (TM8) residues form the hydro- 
phobic core of the intracellular gate (Fig. 3d). Trp228 hydrophobically 
interacts with the conserved Trp17 and Trp163, which form the bottom 
of the central cavity. The NE1 atom of the Trp228 side chain hydro- 
gen bonds with the Ser167 (TM6) side chain. Furthermore, Met232 
forms the hydrophobic core with Ile105 (TM4b), Val168 (TM6) and 
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Figure 3 | Central cavity and intracellular gate. a, The structure of 

the central cavity. In the right panel, the monoolein molecule bound to 
the cavity and its mF, — DF, omit map, contoured at 30, are shown. 

b, c, Liposome-based analyses of SnYddG mutants of the central cavity, 
using threonine (b) and methionine (c) as substrates. The substrate 
uptake activities were plotted as percentages of the wild-type transport 
activity. Error bars, s.d.; n = 3. d-f, The structure of the intracellular gate. 
The mF, — DF, maps, calculated by omitting the key residues for the 
interactions and contoured at 30, are shown. 


Val237 (TM9a), which are weakly conserved as hydrophobic residues 
among the YddGs from other species. In the vicinity of this hydropho- 
bic core, the main-chain carbonyl group of Phe225 and the side chains 
of His70, Tyr166, Ser170 and Asp229 form a hydrogen-bonding net- 
work (Fig. 3e). Water molecules are captured by this hydrogen-bonding 
network, suggesting that the interactions in the intracellular gate 
are impermanent and can dissociate during the transport cycle. 
Furthermore, Arg171 (TM6) and Lys233 (TM8) form hydrogen bonds 
with the main-chain carbonyl groups in the TM8-TM9a and TM6- 
TM7 loops, respectively, which seal the intracellular side of the intra- 
cellular gate (Fig. 3f). Together, these tight interactions separate the 
central cavity from the intracellular space. 

While the present crystal structure of YddG represents the outward- 
facing state in the alternating-access mechanism, the structural and 
sequence similarities between the N and C halves (Fig. 2c and Extended 
Data Fig. 3b) allowed us to generate a feasible structural model for the 
inward-facing state, as in other secondary transporters with inverted 
structural repeats”*-4 (Fig. 4a, b). In this model of the inward-facing 
state, the intracellular gate interactions observed in the outward-facing 
structure are completely dissociated, thus opening the pathway directed 
towards the intracellular side (Fig. 4a). In contrast, the extracellular gate 
is formed by the interactions among the TM1-TM2 and TM3-TM4a 
hairpins, and TM4a and TM9b without any obvious steric clashes, and 
thus the substrate-binding site is occluded from the extracellular side. 
In the crystal structure of the outward-facing state, TM9a contains 
the Gly241 residue, which enables the tight side-by-side interaction 
between TM4b and TM9a (Fig. 4a). TM4a also contains the Gly95 
residue, which could enable a similar side-by-side interaction between 
TM4a and TM9b in the inward-facing structure (Fig. 4b). Moreover, 
hydrophobic packing interactions are probably formed between the 
extracellular tips of the TM1-TM2 and TM3-TM4a hairpins, to create 
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Figure 4 | The outward-inward conformational change of SnYddG. 

a, Crystal structure of the outward-facing state (left) and the modelled 
structure of the inward-facing state (right), viewed from the cytoplasmic 
side. The residues involved in the intracellular gate are shown in stick 
models. b, The same structures as in a, viewed from the periplasmic side. 
The residues possibly involved in the extracellular gate are shown in 
stick models. c, Proposed transport mechanism, illustrating the bending 
motion of the transmembrane segments during the outward-inward 
conformational change. The molecular envelopes are indicated by 
dotted curves. 


the extracellular gate. The side chain of Leu86 (TM3) may be sur- 
rounded by hydrophobic residues, including Ala21 (TM1) and Tyr82 
(TM3), and Tyr82 may form the top of the substrate-binding site in the 
inward-open state (Fig. 4b). The results of the Cys-crosslinking exper- 
iment, as well as an evolutionary covariation analysis of YddG homo- 
logues, provide strong support for the extracellular gate formation 
and our inward-facing model structure (Extended Data Figs 6 and 7; 
see Supplementary Information for further discussion). 

A comparison between the inward-facing model and outward-facing 
crystal structures provides further insights into the structural changes 
that occur during the transport cycle (Fig. 4c and Supplementary 
Video 1). The structures of TM3 and TM4 suggest the bending and 
straightening of the extracellular halves of these transmembrane 
segments, with the region around Gly71-Gly77 in TM3 (Extended 
Data Fig. 8a), as well as the intra-membrane loop in TM4, serving 
as hinges. The bending and straightening of the TM3-TM4 hairpin 
may further involve the tilting and upright motions of TM6, which 
collectively close and open the extracellular entrance of the cen- 
tral cavity (Fig. 4c). Similar structural changes may occur in TM8, 
TM9 and TM1, which are related by the intramolecular pseudo- 
symmetry to TM3, TM4 and TM6. Along with the tilting and 
upright motions of TM1, the hinge motions in the TM8-TM9 
hairpin occur around Gly217-Gly222 in TM8 (Extended Data Fig. 8b) 
and the intra-membrane loop in TM9. These structural changes 
collectively close and open the intracellular entrance of the central cavity 
(Fig. 4c). The results from the molecular dynamics simulations also 
supported this structural change mechanism (Extended Data Fig. 9; 
see Supplementary Information for further discussion). 

In summary, we determined the crystal structure of SnYddG at 
2.4A resolution, which revealed the novel membrane transporter 
topology, with 10-transmembrane segments. The structural and 
complementary functional analyses suggested that YddG operates 
by a unique type of alternating-access mechanism, which is com- 
pletely different from those of other known transporters. Our results 
provide further insight into the common transport mechanism 
shared among the DMT superfamily members, including the SMR 
transporters. 

Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Cloning and expression of YddG. The S. novella yddG gene (gi:502932551) was 
cloned from S. novella genomic DNA (Strain: JCM 20403) into a plasmid derived 
from the pET expression vector, which includes a C-terminal (His)g-tag and a 
tobacco etch virus (TEV) protease cleavage site. The SnYddG protein was over- 
expressed in E. coli Rosetta2 (DE3) strain cells, grown in LB medium containing 
ampicillin (50,.g ml~!). When the culture reached an absorbance at 600 nm of 
~0.5, the cells were induced with 0.5 mM isopropyl 8-thiogalactopyranoside 
(IPTG) for 2h at 37°C. The E. coli yddG gene (gi:152031741) was cloned from 
E. coli K-12 genomic DNA (strain: JCM 20135) into a plasmid derived from the 
expression vector pCGFP-BC', which includes a C-terminal green fluorescent 
protein (GFP), a (His)s-tag and a TEV protease cleavage site. The EcYddG protein 
was overexpressed in E. coli C41(DE3) AacrB cells, grown in LB medium contain- 
ing ampicillin (501g ml~!). When the culture reached an absorbance at 600.:nm of 
~0.5, the cells were induced with 0.5 mM IPTG for 18h at 20°C. 

Purification and crystallization of SnYddG. The SnYddG protein for crystal- 
lization were purified according to the following procedure at 4°C. The cells 
were pelleted by centrifugation at 4,500g, and were disrupted by a Microfluidizer 
(Microfluidics). After centrifugation (12,000g), the supernatant was ultra- 
centrifuged (200,000g), and the membrane fraction was collected. The proteins 
were solubilized from the membrane fraction with 50 mM HEPES (pH 7.0), con- 
taining 300 mM NaCl, 20 mM imidazole, 1 mM phenylmethylsulfonyl fluoride, 
1.2% (w/v) DDM, 0.24% (w/v) cholesteryl hemisuccinate (CHS), and were purified 
by the following three chromatography steps. The insoluble material was removed 
by ultracentrifugation (Beckman Type 70 Ti rotor, 150,000g, 30 min), and the 
supernatant was mixed with Ni-NTA resin (QIAGEN). The (His)s-tag was cleaved 
by TEV protease at 4°C overnight, and the proteins were re-chromatographed 
ona Ni-NTA column. The (His)s-tag-cleaved protein was further purified by 
gel-filtration chromatography (Superdex 200 Increase 10/300 GL, GE Healthcare) 
in 20mM HEPES (pH 7.0), containing 150mM NaCl, 0.03% (w/v) DDM and 
0.006% (w/v) CHS. 

For crystallization, the purified protein was concentrated to approximately 
15mgml, using an Amicon Ultra 50K filter (Millipore). SnYddG was mixed 
with liquefied monoolein (Sigma) in a 2:3 protein to lipid ratio (w/v), using the 
twin-syringe mixing method. For the sandwich-drop crystallization, aliquots of the 
protein-LCP mixture were dispensed onto 96-well glass plates and overlaid with 
the precipitant solution, using a Gryphon LCP (Art Robbins Instruments, LLC). 
Initial crystallization conditions were searched, using screening kits including 
MemMeso, MemGold I and II, and MemStart/MemSys (Molecular Dimensions). 
The initial hits were optimized by changing the concentration of each component, 
as well as additive screening, using the hanging-drop crystallization method. For 
the hanging-drop crystallization, the protein-LCP drops were manually spotted 
onto siliconized glass coverslips and overlaid with the precipitant solutions, and 
then the coverslips were placed upside down onto 24-well plates and sealed with 
each well containing 30011 of reservoir solution. We finally found that the addition 
of (NH4)2SO, to the precipitant solution markedly improved the size of the crystals. 
The native crystals were grown in hanging-drop plates at 20°C, with 50 nl protein- 
LCP drops overlaid with 800 or 1,600 nl precipitant solution, which consisted of 
32-34% PEG550MME, 100 mM Na-citrate (pH 4.5), 100 mM (NH4)2HPOg, and 
100 mM (NH4)2SO,. The heavy atom-derivatized crystals were prepared by the 
soaking method. After the native crystals were grown on hanging-drop plates to 
the full size, the overlaid crystallization solution was replaced with 2,400 nl of the 
solution supplemented with a slightly higher concentration of PEGSS0MME and 
1mM CH3HgCl. The crystals were incubated at 20°C for 3h. All of the crystals 
were flash-cooled in liquid nitrogen for data collection, using the reservoir solution 
as a cryoprotectant. 

Data collection and structure determination of SnYddG. All diffraction data 
sets were collected at the station BL32XU at SPring-8. Data sets were processed 
with the program XDS* and the CCP4 suite”®. The data processing statistics are 
summarized in Extended Data Table 1. The structure was determined by the single 
isomorphous replacement with anomalous scattering method, using the native 
and CH3HgCl-soaked SnYddG crystals. Twenty-four Hg atom sites were iden- 
tified with the program SHELXD”’. The initial phases were calculated with the 
program SHARP”. The resulting phases were improved by solvent flattening with 
the program SOLOMON” and six-fold non-crystallographic symmetry averaging 
with the program DM”. The initial model was built into the map, using the pro- 
gram COOT"!. The model was subsequently improved through alternating cycles 
of manual building with COOT and refinement with the program PHENIX™. 
The structural refinement statistics are summarized in Extended Data Table 1. 
Molecular graphics were illustrated with CueMol (http://www.cuemol.org/). 

Metabolomics analysis. To construct the assay strain W3110 (DE3) AyddG:: 
Km containing the IPTG-inducible T7 polymerase expression unit (\DE3), 
the yddG gene knockout allele was transferred to the destination E. coli 
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strain W3110 (DE3)*> by P1 phage transduction from the systematic E. coli 
knockout library strain, with kanamycin resistance as the selection 
marker™. The knockout allele was confirmed by a PCR analysis, using the 
DNA primers 5‘-ATAGCGGTAGAAAAACGCACCA-3’ and 5'-TGAGATATAAG- 
GTGAATTACTGGTATTTG-3’. E. coli strains W3110 (DE3) and W3110 (DE3) 
AyddG::Km cells, cultivated on LB plates, were inoculated into 5 ml M9 medium, 
containing 0.5% glucose, and shaken at 37°C. When the optical density reached 
0.5, IPTG was added to the medium (final concentration 1 mM), and subsequently, 
the cells were cultivated for approximately 24h. After the cells were removed 
by centrifugation and filtration, the supernatants were analysed by capillary 
electrophoresis-mass spectrometry (CE-MS) at Human Metabolome 
Technologies Inc. 

Liposome-based assay. The SnYddG protein for liposome assay was purified by 
the same procedure as those for the crystallization, but 1.2% (w/v) DDM was 
used for solubilization and 0.25% (w/v) n-decyl-6-p-maltopyranoside (DM) was 
included in every step after solubilization to gel filtration. The EcYddG protein for 
liposome assay was also purified by the same procedure as the SnYddG protein. 
The purified SnYddG and EcYddG proteins were reconstituted into liposomes 
by the following procedure. An E. coli polar lipid extract (Avanti) was dissolved 
in chloroform and dried into a thin film. This film was then resuspended to a 
final concentration of 20mg ml”! in 10mM HEPES buffer (pH 7.0) containing 
100 mM NaCl, and sonicated for 2 min to obtain the liposome solution. The puri- 
fied proteins were added to the liposome solution at a lipid to protein ratio of 50:1 
(w/w). The protein—-liposome mixtures were incubated at 4°C for 30 min, and then 
ultra-centrifuged (200,000g) at 4°C for 3h to remove the detergent. The proteo- 
liposomes were re-suspended to a final concentration of 20mg ml! and stored 
at —80°C. Protein-free liposomes were prepared by a similar procedure, except 
that the protein solution was replaced with the buffer used for the final purification 
step. The liposomes were sonicated immediately before the measurements to pre- 
pare uniformly-sized liposomes. The time-dependent [!4C]Thr (175 mCi mmol}; 
Moravek Biochemicals) uptake assay was initiated by mixing the liposome solution 
(4511) with an equal volume of the extraliposomal solution, consisting of 10mM 
HEPES (pH 7.0), 100 mM NaCl, 100|.M amino acid, and 2% (v/v) ['4C]amino 
acid. After the reaction at 37°C, the aliquot (20,11) of the reaction mixture were 
isolated by Sephadex G-50 (GE Healthcare) gel filtration, and the radioactivity of the 
incorporated [‘4C]amino acid was measured by liquid scintillation counting. The 
Met-, Glu- and Lys-uptake assays were also performed with a similar condition to 
that of Thr-uptake assay, using [/4C]Met, [!4C]Glu and [4C]Lys (55 mCi mmol", 
210mCi mmol”! and 288 mCi mmol", respectively; American Radiolabelled 
Chemicals). For mutational analyses, mutations were introduced by a PCR-based 
method. The mutant proteins were expressed, purified, and reconstituted into 
liposomes, and the transport activities were measured by a similar procedure to 
that for the wild type. The assays were initiated by mixing the liposome solution 
(2511) with an equal volume of the extraliposomal solution, and after the 30-min 
reaction at 37 °C, the aliquot (20,11) were isolated for the subsequent radioactivity 
measurement. The reconstitution rates of the wild-type and mutant proteins were 
determined by fractionating the proteoliposome samples on an SDS-PAGE gel, 
and quantifying the amount of SnYddG protein by an LAS-3000 image analyser. 
All assays were repeated three times. Error bars represent s.d. 

Cysteine cross-link analysis of SnYddG. The Cys-free mutant of SnYddG (C159A/ 
C185A/195A/C271A) was constructed using a PCR-based method. The double- 
Cys mutants (A21C/P91C, A21C/A92C, T24C/A85C, T24C/L86C, A138C/A266C) 
were also constructed using a PCR-based method, based on the Cys-free mutant. 
The mutant proteins for the cross-link analysis were prepared by the same procedure 
as those for the crystallization, but 1.2% (w/v) DDM was used for solubilization and 
0.03% (w/v) DDM was included in every step after solubilization to gel filtration. 
The purified mutant proteins were incubated with 10 mM tris(2-carboxyethyl) 
phosphine (TCEP) or 1mM copper phenanthroline [Cu(phen)3] at 37°C for 
30 min, followed by trichloroacetic acid precipitation. The pellets were dissolved 
in SDS-PAGE sample buffer, containing 1% SDS and 201M tetramethylrhodamine 
maleimide (TMRM), incubated at 37°C for 90 min, and then analysed by SDS- 
PAGE. The fluorescence of the TMRM-modified proteins was visualized with an 
LAS-3000 image analyser. 

For quantification of intramolecular disulphide formation by SnYddG 
double-Cys mutants, they were subjected to reductive or non-reductive carboxym- 
ethylation. For reductive carboxymethylation, the protein (~2 1g) was dissolved in 
20 il of 1% dithiothreitol in 6M guanidine hydrochloride, 1 M Tris-HCl (pH 8.5) 
and 10mM EDTA, and was heated at 80°C for 30 min. After cooling, alkylation was 
performed by the addition of 211 of a 25% iodoacetic acid solution in 1 N NaOH 
and an incubation at room temperature for 30 min in the dark. For non-reductive 
carboxymethylation, the protein was dissolved in 2.5% iodoacetic acid in 
6M guanidine HCl, 1M Tris-HCl (pH 8.5) and 10mM EDTA. The reaction 
mixtures were desalted with a Sephadex G-25 syringe (1 ml), and pooled fractions 
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of the carboxymethylated protein were dried and hydrolysed in 6 N HCl vapour 
at 110°C for 20h. The acid hydrolysate was derivatized with 6-aminoquinolyl-N- 
hydroxysuccinimidyl carbamate, and was quantified as described previously** 
(Extended Data Fig. 6b). 

Co-evolutionary analysis. The co-evolutionary analysis of YddG homologues and 
other DMT proteins was performed using the EVcoupling® web interface (http:// 
evfold.org/evfold-web/evfold.do). The default parameters of the web interface 
were employed for the calculations, except that the E-value threshold for gen- 
erating the sequence alignment was changed to —30. The resulting number of 
YddG and DMT homologue sequences used for the calculation was 59,114. The 
subsequent 3D-structure prediction of SnYddG was performed by the program 
EVfold_membrane*’. The default parameters of the web interface were employed 
for the calculation, except that the ‘membrane proteim option was turned on and 
the numbers of flanking upstream and downstream residues in the secondary 
structure prediction were changed to 0. 

Molecular dynamics simulation. The atomic coordinates of the crystal structure of 
SnYddG (molecule B) were used for the simulation. The disordered region (Ala138- 
Gly144) was modelled by adding the corresponding coordinates in molecule E. All 
of the water molecules observed in the crystal structure were kept. The missing 
hydrogen atoms were built with the program VMD™*. A periodic boundary system, 
including explicit solvent and a phosphoryloleoylphosphatidylethanolamine 
(POPE) lipid bilayer*’, was prepared. The net charge of the simulation system 
was neutralized through the addition of 150mM NaCl. The simulation system 
was 96 x 96 x 96 A, and contained 80,530 atoms. The molecular topologies and 
parameters from the Charmm36 force field*? were used for the protein, lipid and 
water molecules. 

Molecular dynamics simulations were performed with the program NAMD 
2.10 (ref. 40). The systems were first energy minimized for 1,000 steps with fixed 
positions of the non-hydrogen atoms, and then for another 1,000 steps with 
10kcal mol" restraints for the non-hydrogen atoms, except for the lipid molecules 
within 5.0 A from the proteins. Next, equilibrations were performed for 0.01 ns 
under NVT conditions, with 10 kcal mol™! restraints for the heavy atoms of the 
protein. Finally, equilibrations were performed for 0.5 ns under NPT conditions 
with the 1.0kcal mol”! restraints. In the equilibration and production processes, 
the pressure and temperature were set to 1.0 atm and 310K, respectively. Constant 
temperature was maintained by using Langevin dynamics. Constant pressure 
was maintained by using the Langevin piston Nosé—Hoover method"!. Long- 
range electrostatic interactions were calculated by using the particle mesh Ewald 
method”. The production run of the equilibrium simulation was performed for 
500 ns, starting from the crystal structure. The outward-to-occluded simulation 
run was also performed for 500 ns, starting from the crystal structure, with a har- 
monic distance restraint (force constant = 10.0 kcal mol! A“) between centres of 
mass of the Ca atoms of TM4a (Pro91—Ala98) and TM9b (Ala246-Leu253). The 
equilibrium distance of the harmonic restraint was gradually decreased from 15 A 
to9A during the 500-ns run. Next, the occluded-to-inward simulation run was 
performed, starting from the final snapshot of the outward-to-occluded simulation. 


A similar harmonic distance restraint was applied between centres of mass of the 
Ca atoms of TM4b (Trp101-Phe108) and TM9a (Val237-Ser244), with the equi- 
librium distance gradually increased from 9 A to 15 A. 

Statistical analyses. The statistical significance of differences in mean values was 
calculated using unpaired, two-tailed Student's t-test. No statistical methods were 
used to predetermine sample size. 
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Extended Data Figure 1 | Metabolomics analysis of the E. coli W3110 
wild-type (WT) and YddG knockout (AyddG) strains, containing 
either the empty vector or the EcYddG protein expression vector. 

We analysed the 110 well-known metabolites in the culture media of 
wild-type and AyddG cells, transformed with an empty vector or the 
pET-derived vector containing the EcYddG gene. The amounts of the 
metabolites in the media that exhibited more than threefold increases in 
both the wild-type and AyddG strains are plotted. The results showed the 
consistent increases of valine, threonine and isoleucine, in the medium 
of the EcYddG-expressing cells, as compared to the negative controls. 
These results are consistent with those from the in vitro liposome-based 
analysis of EcYddG (Fig. 1), supporting the notion that YddG transports a 
wide range of metabolites, as well as amino acids. 
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Extended Data Figure 2 | The 2.4-A electron density map and molecular surface of SnYddG. a, b, A stereo view of the unbiased 2mF, — DF, maps 
around TM4 and TM9 (a), and an overall view of molecule B in the crystal asymmetric unit, contoured at 1.10 (b) are shown. c, Molecular surface of 
SnYddG, viewed from different directions. The transmembrane segments are coloured as in Fig. 2. 
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Extended Data Figure 3 | See next page for caption. 
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Extended Data Figure 3 | Conserved residues of YddG and SMR 
homologues. a, Amino acid sequence alignment of SnYddG and other 
YddG homologues, from Bradyrhizobium japonicum (UniProtKB 

ID: AOADA3YL06), Virgibacillus halodenitrificans (GenBank ID: 
CDQ34001.1), Serratia marcescens (UniProtKB ID: LOMHZ3), 
Pectobacterium carotovorum (UniProtKB ID: AOAOE2ZTH4), 

E. coli (UniProtKB ID: P46136), Vibrio vulnificus (UniProtKB ID: 
Q7MDM6), Acinetobacter baumannii (UniProtKB ID: AOA009YGQ0O), 
Bifidobacterium merycicum (UniProtKB ID: A0A087BD75), Phaeobacter 
inhibens (UniProtKB ID: I7EXD9), and Pseudomonas sp. (UniProtKB ID: 
B2Z3V9). The conserved amino acids involved in the transport 


mechanism and discussed in the main text are indicated. b, Amino acid 
sequence alignment of the N and C halves of SnYddG. c, Amino acid 
sequence alignment of EmrE homologues, from E. coli (UniProtKB ID: 
P23895), S. enterica (UniProtKB ID: X5AXHO0), Xanthobacter 
autotrophicus (UniProtKB IDs: A7IJY9 and A7IN30), Saccharomonospora 
viridis (UniProtKB IDs: C7MRM1 and C7MSEO), Thermobifida fusca 
(UniProtKB IDs: Q47R92 and Q47QH0), Thermus thermophilus 
(UniProtKB ID: Q72K82), Thermobispora bispora (UniProtKB IDs: 
D6YBB5 and D6Y4B6), Thermomonospora curvata (UniProtKB IDs: 
D1A404 and D1AD18), and Zymomonas mobilis (UniProtKB IDs: 
Q5NPN9 and Q5NRC2). 
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Extended Data Figure 4 | Molecular evolution of the DMT superfamily. 
a, Crystal structure of E. coli EmrE (PDB accession 3B5D)!. The 

bound tetraphenylphosphonium (TPP) is shown as a stick model. The 
transmembrane segments are coloured and labelled as in d. b, The present 
crystal structure of SnYddG. Only the transmembrane segments are 
shown in cylinder models. The transmembrane segments are coloured 
and labelled as in d. c, Molecular superimposition of EmrE and SnYddG, 
calculated by the SSM algorithm*’. The cylinder model of EmrE is shown 
with the same colouring as in d. The semi-transparent cylinder model of 


SnYddG is shown with the same colouring as in d. d, Schematic illustration 
of the possible evolutionary path of the DMT family transporters. The 
transmembrane segments are indicated by circles, and the connecting 
loops are indicated by solid and dotted curves. The black arrows 

indicate the folding of the transporter, while the grey arrows indicate 

the possible evolutional pathway. Transmembrane segments originating 
from one protomer of SMR are coloured blue, while the newly inserted 
transmembrane segments (TM a) are coloured red. 
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Extended Data Figure 5 | bee tek of the reconstituted 
proteoliposomes containing the SnYddG mutants. a, The 
chromatograms of the final purification step by the gel-filtration 
chromatography column. The results of the wild type, as well as the 
mutants with the decreased activities in the liposome-based assay (that is, 
G71A, H79A and G222A), are shown. b, Reconstitution rates of the 
wild-type and SnYddG mutants, determined by the SDS-PAGE analysis. 
The reconstitution rates were measured as a percentage of the total amount 
of protein used for the reconstitution. 
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Extended Data Figure 6 | Cys-crosslink analysis of the residues on the were completely masked from the Cys-modification by TMRM under 
periplasmic sides of TM1 (Ala21, Thr24) and TM3 (Ala85, Leu86, the oxidized conditions, indicating that these residues are close enough 
Pro91, Ala92). a, The Cys-free mutant of SnYddG (C159A/C185A/ to form disulfide bonds. b, The results of the quantification of the amino 


C195A/C271A) was created, and then pairs of Cys residues to the residues —_ acid composition of the cross-linking products of the YddG double-Cys 
in TM1 and TM3 or TM1 and TM4, which may form the intracellular gate, | mutants. The percentages of disulfide bond formation for each mutant 


were introduced. The purified double-Cys mutant proteins were oxidized were calculated from the amounts of carboxymethylated Cys under the 
or reduced, and then modified by the Cys-reactive fluorescent reagent, oxidizing and reducing conditions. c, The positions of the mutated residues 
tetramethylrhodamine maleimide (TMRM). The double-Cys mutants are indicated in the crystal structure viewed from the periplasmic side. 
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* Max Score 
» Min Score 


Residue No. 
150 


100 


2 4 5 6 7 9 
50 100 150 200 250 
Residue No. 


. EC score Distance (A) 
EC score Distance (A) Perera 


91 255 0.5117 16.38 
105 241 0.7390 5.68 

91 259 0.4537 20.29 
128 245 0.4817 5.20 

91 261 0.4286 16.99 
115 238 0.2907 10.21 

96 271 0.4052 16.77 


fon 8 
- 9 
Pea EC score Distance (A) 
EC score Distance (A) 


24 83 0.6543 13.71 
170 229 0.4565 6.95 
33 80 0.4155 8.35 
171 236 0.3705 10.28 
24 86 0.3734 13.75 
171 237 0.3213 8.66 
22 91 0.2901 15.08 
167 228 0.3042 8.17 
36 83 0.2707 10.49 
163 228 0.2344 11.87 
21 82 0.2539 14.78 
166 229 0.2330 9.87 
21 95 0.2355 11.16 
167 241 0.2319 9:37 
25 86 0.2226 15.63 
Extended Data Figure 7 | Evolutionary covariation analysis of YddG of SnYddG. The ECs indicated in a and tabulated in the top and bottom 
and DMT protein homologues. a, Contact maps of the top-ranked 170 panels are indicated as arrows connecting the corresponding residues 
evolutionary constraints (ECs), calculated from the 59,114 homologue in the SnYddG structure. The solid and dashed arrows indicate 
sequences using the program EVcoupling. The ECs are indicated by stars, the ECs corresponding to the observed extracellular gate and the putative 
coloured proportionally according to their EC scores, from orange to red. intracellular gate interactions, respectively. c, The SnYddG structure 
The solid and dashed rectangles indicate the ECs corresponding to the predicted from the top-ranked 170 ECs, using the program 
observed extracellular gate and the putative intracellular gate interactions, EVfold_membrane. 


respectively. b, The ECs mapped onto the outward-open crystal structure 
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Extended Data Figure 8 | The kinks of TM3 and TM8 around the 
conserved Gly and Pro residues. a, The kinks of TM3 around the 
conserved Gly residues, in the outward-open crystal structure and the 
inward-open model. b, The kinks of TM8 around the conserved Gly and 
Pro residues, in the outward-open crystal structure and the inward-open 
model. In a and b, the Gly and Pro residues are shown by stick models, and 
the axes of the transmembrane helices are indicated by cylinders. In the 
right panel, the activities of SnYddG mutants of the TM3 and TM8 kinks, 
measured by the liposome-based analysis using threonine as a substrate, 
are plotted, respectively. The substrate uptake activities measured after 
30 min were plotted as percentages of the wild-type transport activity. 
Error bars, s.d.; n =3. 
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Extended Data Figure 9 | Molecular dynamics simulation of SnYddG. 
a, The results of the 500-ns run of the non-biased simulation. Left, a plot 
of the root mean square fluctuations of each Ca atom from the initial 
crystal structure. Right, a plot of the root mean square deviation over 

all Ca atoms from the initial crystal structure, during the course of the 
simulation. b, The results of the outward-to-occluded simulation. Top 
left, a plot of the distances between the Ca atoms of TM4a (Pro91-Ala98) 
and TM9b (Ala246-Leu253), and between TM4b (Trp101-Phe108) 

and TM9a (Val237-Ser244). The equilibrium distance of the harmonic 
restraint applied between TM4a and TM9b is plotted with a red line. 

Top right, time series of the distances between the Ca atoms of Ser167 
(TM6) and Trp228 (TM8), and between Ala21 (TM1) and Leu86 (TM3). 
Bottom left, time series of the distances between the Ca atoms of TM1 
(17-24) and TM3 (81-88) and the Ca atoms of TM6 (163-170) and TM8 
(225-232). Bottom right, the final snapshot of the outward-to-occluded 
simulation. The Ca atoms of Ala21 (TM1) and Leu86 (TM3) are shown 
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as spheres. The parts of TM4a and TM9b subjected to harmonic restraints 
are coloured grey. The arrows indicate the distances plotted in the graphs 
in band c. c, The results of the occluded-to-inward simulation. Top left, 
plot of the distance between the Ca atoms of TM4a (Pro91-Ala98) and 
TM9b (Ala246-Leu253), and between TM4b (Trp101-Phe108) and TM9a 
(Val237-Ser244). The equilibrium distance of the harmonic restraint 
applied between TM4b and TM9a is plotted with a red line. Top right, 
time series of the distances between the Ca atoms of Ser167 (TM6) and 
Trp228 (TM8), and between Ala21 (TM1) and Leu86 (TM3). Bottom left, 
time series of the distances between the Ca atoms of TM1 (17-24) and 
TM3 (81-88) and between the Ca atoms of TM6 (163-170) and TM8 
(225-232). Bottom right, the final snapshot of the occluded-to-inward 
simulation. The Ca atoms of Ser167 (TM6) and Trp228 (TM8) are shown 
as spheres. The parts of TM4b and TM9a subjected to harmonic restraints 
are coloured grey. The arrows indicate the distances plotted in the graphs 
inbandc. 
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Extended Data Table 1 | Data collection and refinement statistics 


Native CH3HgCl derivative 


Data collection 
X-ray source 
Wavelength (A) 
Space group 
Cell dimensions 

a, b,c (A) 

a, B, y (°) 
Resolution (A) 
Rsym 
ToD) 
Completeness (%) 
Multiplicity 
CCi2 


Refinement 

Resolution (A) 

No. reflections 

Rwork/Réree 

No. atoms 
Protein 
Ligand/ion/lipid 
Water 

B-factors (A?) 
Protein 
Ligand/ion/lipid 
Water 

R.m.s. deviations 
Bond length (A) 
Bond angle (°) 

Ramachandran plot 
Favored (%) 
Allowed (%) 
Outliers (%) 


SPring-8 BL32XU 
1.0 1.0 
P2 P2 


105.84, 84.65, 112.25 106.67, 85.89, 112.85 


90, 108.46, 90 90, 109.44, 90 


50-2.4 (2.54-2.4)* 50-3.5 (3.59-3.5) 


0.120 (1.269) 0.272 (1.26) 
13.25 (1.91) 6.11 (1.54) 
99.8 (99.8) 99.8 (99.9) 
8.47 (8.45) 4.95 (4.98) 

0.998 (0.619) 0.989 (0.572) 


50-2.4 
73,750 
0.2264/0.2495 


11,564 
349 
ul 


66.44 
69.43 
47.96 


0.003 
0.751 


98.07 
1.87 
0.06 


«Values in parentheses are for highest-resolution shell. 
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COURTESY OF THE GLADSTONE INSTITUTES 


A SIMPLER TWIST 


OF FATE 


Ways to directly convert one mature cell type into another may eventually offer a safer, 
faster strategy for regenerative medicine. 


The green heart muscle cells are ‘natural’. The orange ones were fibroblasts that have been directly reprogrammed to become heart muscle cells. 


BY MICHAEL EISENSTEIN 


ntil the day it dies, a cell that has 
User a skin cell remains a skin cell 
— or so scientists used to think. Over 
the past decade, it has become clear that cel- 
lular identity is not written in stone but can 
be rewritten by activating specific genetic pro- 
grams. Today, the field of regenerative medi- 
cine faces a question: should this rewriting 
take the conventional route, in which mature 
cells are first converted back into stem cells, or, 
where feasible, a more direct approach? 
“Terminally differentiated’ is a term that 
sums up the old way of thinking — that skin, 
muscle or other mature cells cannot be coaxed 
to adopt a drastically different fate. That idea 
began to falter a decade ago, when cell biolo- 
gist Shinya Yamanaka of Kyoto University in 
Japan showed that a handful of genes could 
transform adult fibroblast (connective tissue) 


cells into induced pluripotent stem (iPS) 
cells’. Like embryonic stem cells, iPS cells can 
develop into any cell type, a property called 
pluripotency. They can also be produced in 
unlimited quantities, unlike embryonic stem 
cells, which must be harvested from human 
embryos and therefore come with considerable 
political baggage. 

Just a few years after Yamanaka’s discovery 
— which earned him a share of the 2012 Nobel 
Prize in Physiology or Medicine — researchers 
began uncovering shortcuts for switching cell 
types that they called ‘direct reprogramming’ 
Mature cells of one kind could be coaxed to 
directly become another, with no pluripo- 
tent middleman. Researchers have learned 
how to turn skin cells into neurons or heart 
cells, and stomach cells into insulin-produc- 
ing pancreatic B-cells. “It’s amazing to watch 
the cells change right before your eyes,’ says 
Benedikt Berninger of the Johannes Gutenberg 
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University of Mainz in Germany, who uses 
direct reprogramming to generate neurons. 

Research into direct reprogramming is more 
preliminary than work on iPS cells, but it is 
stirring excitement in regenerative medicine. 
Directly reprogrammed cells might be safer 
than cells that pass through a pluripotent 
state, because the latter share with tumour 
cells a capacity for extensive proliferation 
— making them potentially cancer-causing 
Trojan horses. 

Clinical interventions based on iPS cells 
must be done carefully to ensure that no pluri- 
potent cells are transplanted along with the 
fully mature cells. “There’s a risk that you could 
lose control of these cells and that they start 
proliferating uncontrollably after transplanta- 
tion,” says Malin Parmar, a neurobiologist at 
Lund University in Sweden who hopes to use 
direct reprogramming to reverse the loss of 
neurons in people with Parkinson's disease. 
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STEM CELLS 


» “But if you bypass the pluripotent stage, it’s 
a lot quicker and potentially safer” 


CHANGING PROGRAMS 

Rewriting cellular identities first requires an 
understanding of how those identities are 
established. Every cell in the body can trace 
its ancestry back to a single progenitor: the 
fertilized egg. As embryonic cells divide and 
mature, their destiny is determined by the spe- 
cific genes that are switched on and off over the 
course of development. Proteins called tran- 
scription factors regulate this process by bind- 
ing certain DNA sequences in the genome, and 
subsequently activating or suppressing adja- 
cent genes. The ones that govern the fate of a 
developing cell are often called master regu- 
lators because they operate at the summit of 
complicated cascades of gene activity. 

“These master regulators are basically all 
defined by their pivotal roles in embryogenesis 
in the development of certain cell types,” says 
Qiao Zhou, a cell biologist at Harvard Stem 
Cell Institute in Cambridge, Massachusetts. 
“Perhaps a progenitor cell can become cell A 
or Bor C, but if you force it to express a certain 
master regulator, it will inevitably choose A” 

An early demonstration of the usefulness 
of master regulators for direct reprogram- 
ming came as far back as 1987, when Harold 
Weintraub, Andrew Lassar and their colleagues 
at the Fred Hutchinson Cancer Research 
Center in Seattle, Washington, showed that 
forcing fibroblasts to express a certain por- 
tion of DNA put them on a developmental 
path to become muscle cells; they later discov- 
ered that the single gene responsible encodes 
the transcription factor MyoD’. “That was a 
paradigm-shifting observation, and people 
in the field thought that most other cell types 
would have that one key factor that would be 
powerful enough to convert the fate of a cell? 
says Deepak Srivastava, a heart development 
researcher at the Gladstone Institute of Cardio- 
vascular Disease in San Francisco, California. 

But it wasnt that simple. The hunt for indi- 
vidual master regulators that could initiate 
reprogramming would yield many years of dis- 
appointment — until Yamanaka's work on iPS 
cells revealed that the secret of effective repro- 
gramming was not a single factor, but rather 
combinations of multiple genes. As researchers 
started to mix and match different sets of mas- 
ter regulators, success stories began to emerge. 

In 2008, Zhou was part of a team led by 
Harvard scientist Douglas Melton that trans- 
formed one type of pancreatic cell into another, 
generating the insulin-secreting B-cells that are 
needed by many people with diabetes. “Our 
study concluded that you need a minimum of 
three master regulators to make that happen,” 
says Zhou. In 2010, a group led by stem-cell 
scientist Marius Wernig of Stanford University 
in California turned fibroblasts into neurons, 
also using a trio of genes’. Further refinements 
and extensions of this work gave rise to a host 
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of different, specialized neurons, with each 
type producing or responding to distinct neu- 
rotransmitter signals. 

Most of these pioneering demonstrations of 
direct reprogramming have been achieved with 
cultured cells. Yet many researchers see much 
greater promise for regenerative medicine if cell 
conversions can be prompted inside the body. 
Pools of cells that are relatively abundant in an 
organ could be transformed into other kinds of 
mature cells that are more desperately needed. 
So far, there have been a handful of triumphs in 
animal experiments. Parmar’s group, for exam- 
ple, found that glial cells can be converted into 
functional neurons by injecting viruses laden 
with genes for reprogramming factors into the 
brains of mice. And Srivastava has likewise 
turned mouse fibroblasts inside the heart into 
beating cardiac muscle cells, a strategy that may 
offer a way to repair damage caused by a heart 
attack. “You've got this vast pool of cells that 
are already in the organ that you can harness 
for regeneration,’ he says. But no one has so 
far tried direct reprogramming inside a human. 


IDENTITY CRISIS 

For now, most research is focused on ensur- 
ing the success of the reprogramming process. 
Investigators not only have to work out a suc- 
cessful combination of master regulators that 
turns on the genes that define a certain cell 
type: they also, ideally, have to discover the 
smallest possible set. This is because the most 
reliable way to force a cell to express master 


regulator genes is to deliver additional cop- 
ies of these genes to it, and delivering many 
genes into a cell is a much tougher technical 
challenge than providing just a few. Working 
out the minimal set of master regulators can 
be a slog: often the roster of candidate com- 
binations is huge, and the only way through a 
thicket of options is to systematically test each 
one. Parmar’s team started with 12 candidate 
genes for generating dopamine-producing 
neurons, for example, before eventually nar- 
rowing it down to 2. 

Some researchers have started to create soft- 
ware specifically for direct reprogramming 
that incorporates information about which 
master regulators control the formation of 
tissues. A team spread over three continents 
has developed an experiment-planning tool 
called Mogrify*, which brings together large 
quantities of gene-expression data from a long 
list of cell types with rules about the gene net- 
works that different master regulators control. 
Mogrify uses these to predict the combination 
of reprogramming factors that will cause a 
desired cellular identity change. The idea is 
to provide researchers with a way to compu- 
tationally identify the fewest possible master 
regulator genes that can directly reprogram 
one particular cell type into another. 

But providing active master regulator genes 
isn't always enough to ensure complete repro- 
gramming: the master regulators may success- 
fully set a cell on a developmental path, but then 
leave it stranded in an immature, precursor 


Better modifying through chemistry 


Transcription factors are a natural choice 
for reprogramming overall gene activity, 
but the genes encoding them first have to 
be delivered into target cells. This process 
is laborious and raises potential safety 
concerns for clinical applications. 

Chemical biologist Sheng Ding has found 
an alternative method: inducing direct 
reprogramming with cocktails of chemicals. 
In his lab at the Gladstone Institute of 
Cardiovascular Research in San Francisco, 
California, he has spent much of the past 
decade building a library of compounds 
that can greatly modify gene expression. 
Using chemical agents such as A83-01 and 
LDN193189, which can switch off certain 
cellular signalling pathways, Ding has 
successfully reprogrammed adult cells with 
no transcription factors whatsoever. 

He thinks his approach is much less 
artificial because he externally triggers 
innate cellular mechanisms that lead to 
reprogramming, rather than abruptly 
forcing cells to produce proteins that they 
normally wouldn't. “It’s really a gradual 
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reprogramming process,” he says. 

In a pair of recent experiments, Ding and 
his colleagues converted fibroblasts into 
neural stem cells’ and cardiac muscle® by 
applying different cocktails, each consisting 
of nine different compounds, some of them 
pharmaceuticals. Although the results 
are impressive, heart researcher Deepak 
Srivastava, who collaborated with Ding at 
Gladstone on this work, notes that the heart 
muscle cells produced in this way are more 
developmentally immature than those 
reprogrammed with transcription factors. 

But the approach offers far more 
precise control over reprogramming in cell 
culture than direct reprogramming with 
transcription factors alone, and advocates 
of Ding’s strategy think that it could avoid 
some of the regulatory complications 
around gene therapy. Only a handful of labs 
have tried this chemistry-alone approach, 
but many have quickly become converts, 
among them Hongkui Deng of Peking 
University in Beijing. “We see chemical 
reprogramming as the future,” he says. WE. 


KENNET ROUNA 


state. Then the task is to identify which addi- 
tional genes must be active to finish the process, 
and add them to the delivery package. 

Stem-cell biologist Hongkui Deng at Peking 
University in Beijing struggled with this prob- 
lem for years. His initial efforts to directly turn 
skin cells into liver cells through the forced 
expression of master regulator genes alone 
yielded cells that failed to perform key, liver- 
like functions. Then, during a second round 
of screening, he identified additional genes 
that could complete the reprogramming’. He 
calls them maturation factors — genes that are 
unimportant for initiating the conversion but 
crucial for obtaining functionally mature cells. 

Other researchers have found that they can 
boost the success rate of direct reprogramming 
by augmenting the effects of master regulator 
genes with chemicals that act on cellular sig- 
nalling pathways to promote reprogramming 
— occasionally, chemicals alone can prompt a 
cell-type transformation (see ‘Better modify- 
ing through chemistry’). 

Even with the appropriate gene and chemi- 
cal deliveries, it is hard to prove that any direct 
reprogramming is truly complete. Peering 
through a microscope can reveal whether a 
transformation has taken place — for exam- 
ple, whether flat, star-shaped fibroblasts have 
formed long, axon-like projections — but 
deeper analysis of the cell’s inner workings is 
also needed. Put simply, how can one be certain 
that a reprogrammed skin cell has truly become 
a neuron, and is not merely ‘neuron-like’? 

Measuring the downstream activity of mas- 
ter regulator genes can offer insights into how 
well reprogramming has succeeded. If the 
introduced master regulators are doing their 
job, they should cause grand shifts in the overall 
patterns of gene expression in the cell nucleus, 
which should match the patterns found in 
mature cells of the target tissue. There are sev- 
eral ways to survey a cell's total gene expression 
— for example, sequencing all of the RNA mol- 
ecules in it. Researchers at Boston University 
and Harvard University in Massachusetts have 
drawn on this kind of data in their development 
of CellNet, a software program that can assess 
how well the gene activity in reprogrammed 
cells matches that of target cells®. 

Still, the identity test that really matters is 
whether reprogrammed cells can functionally 
replace naturally differentiated cells. “If they 
look like neurons and have gene expression 
like neurons, that doesn’t mean they’re really 
neurons,’ says Chun-Li Zhang, a neurobiol- 
ogist at the University of Texas Southwestern 
Medical Center in Dallas. Convincing proof 
requires a battery of assessments, such as elec- 
trophysiological measurements that confirm 
whether a newly formed neuron is firing and 
is therefore capable of activating other neu- 
rons that are linked to it by synapses. No one 
characteristic can provide sufficient evidence 
in isolation, says Zhou. His group's attempts 
to reprogram liver cells into pancreatic B-cells 


Researchers studying reprogrammed neurons in Malin Parmar’s lab at Lund University in Sweden. 


yielded only dysfunctional intermediates. 
“They synthesized and released insulin in large 
quantities — so much so that the animals died 
from hypoglycaemia,’ he says. This is because 
the cells lacked pancreatic cells ability to sense 
and respond to blood glucose levels. 

One of the findings of these diagnostic tests 
is that prompting reprogramming within a 
target organ often works better than efforts 
with cultured cells. “Most of our cells only 
partially reprogram to 


cardiac muscle when “If they look 
they're on plastic,” says like neurons 
Srivastava. “Butintheir “nd have gene 
natural environment, expression like 
the majority go allthe neurons, that 
way toa beating state, doesn’tmean 
where they're electri- they’re really 
cally coupled with their neurons.” 


neighbours.” This may 

be due to chemical cues generated by other 
neighbouring cells in the organ, or because 
of features of the 3D tissue environment that 
are hard to replicate in the lab. Whatever the 
reason, it bodes well for developing clinical 
applications. 


PATH TO THE CLINIC 

Researchers agree that there are many hur- 
dles to overcome before these methods can be 
tested in people. In general, human cells have 
proven more challenging to directly reprogram 
than mouse cells: they tend to take longer to go 
through the reprogramming process and often 
require additional transcription factors to those 
that are sufficient in animal experiments. 

Gene delivery also poses formidable chal- 
lenges, especially into organs such as the brain. 
In some cases, viruses that preferentially infect 
particular cell types could help to guide repro- 
gramming factors to specific sites of disease 
or injury, but delivery to unintended sites may 
still pose risks. 

Then there's the issue of ‘robbing Peter to 
pay Paul. Transforming glia into neurons in 
the brain reduces the number of glia there — 
which might pose a hazard. “These cells are not 
just for decoration,’ says Berninger. “They have 
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important functions, and how do we replace 
them if we take them away?” One possibility is 
to reprogram cells to a proliferative — but non- 
tumorigenic — neural progenitor state. That 
way, a few glia could yield numerous neurons. 

Heart treatments are probably closest to 
the clinic. Srivastava’s team has already begun 
studies to turn fibroblasts inside pig hearts into 
cardiac muscle cells. “We have initial proof of 
concept that even in a big heart like ours, we 
can achieve efficacy,’ he says. The team is now 
carrying out safety studies and refining their 
gene delivery method with the aim of gaining 
regulatory approval for human trials. Impor- 
tantly, heart fibroblasts are self-replenishing, 
so concerns over cell loss are less acute. 

Zhou's team is also making headway towards 
the clinic in its attempts to switch cultured 
human gastrointestinal cells directly into 
B-cells. The gut cells are easily obtained by 
biopsy, and after cultivation and reprogram- 
ming they could, in theory, be transplanted into 
the pancreases of volunteers who have diabetes. 

Direct reprogramming is beginning to gar- 
ner interest from industry, although biotech- 
nology and pharmaceutical companies are not 
quite ready to jump in with both feet. Although 
research into iPS cells and embryonic stem 
cells has a head start in this respect, the gap 
may close as the advantages of direct cell-type 
switching come into focus. “There is not yet 
a comparable amount of resources and man- 
power going into this approach,’ says Zhou. 
“But the field is quickly catching up, and I can’t 
wait to see where it’s going” m 


Michael Eisenstein is a science writer based 
in Philadelphia, Pennsylvania. 
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CAREERS 


SATELLITE ARCHAEOLOGY Citizen science could 
save ancient sites from destruction p.427 


STARTING SALARIES In pay negotiations, 
knowledge is power go.nature.com/1zacnl6 


NATUREJOBS For the latest career 
listings and advice www.naturejohs.com 


LANGUAGE STUDIES 


Learn the local 
lingo to get ahead 


English is widely spoken in science, but mastering another 
language can open doors, especially when working abroad. 


BY CAMERON WALKER 


instruction along with his PhD studies on the 

psychology of animal behaviour at Emory 
University in Atlanta, Georgia. Several times 
a month, he would go to a nearby Buddhist 
temple to practise writing and reading Thai with 
native speakers. 


Jin Plotnik pursued a different kind of 


He had decided to learn the language after 
finding field sites in Thailand where he could 
study elephants. He put his lessons to good use 
the following year, when he returned to the sites 
for his dissertation research. In rural Thailand, 
few people spoke more than a few words of 
English. So Plotnik threw himself into speaking 
Thai, no matter how ridiculous he might sound. 
He carried notebooks to write down phrases, 
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got help from a US researcher who had spent 
decades in the country and made local friends. 

Now at Mahidol University near Bangkok, 
Plotnik says that learning to speak (and read) 
Thai was essential for his doctoral research — 
as well as for his work today at the university, 
where he lectures in Thai, and as the executive 
director of Think Elephants International, a 
non-profit conservation organization that he 
founded in 2011. He would never, he says, have 
been able to develop relationships with locals 
who care for native elephants or talk to govern- 
ment officials about the animals’ role in tour- 
ism. “Ifyou really want to have an impact ina 
place, and develop collaborations, partnerships 
and relationships,” he says, “you need to learn 
the language.” 

Although English is the universal language 
of science, many scientists have found that 
learning to speak and read the native language 
of the nation in which they work or study can 
open doors to new research projects and job 
opportunities, and can enhance life satisfac- 
tion. Early-career scientists who plan to look 
for postdoctoral positions or fellowships 
abroad, or wish to collaborate with researchers 
around the globe can seek formal instruction 
before their arrival (see “Where to find your 
voice). Once in the country, they should spend 
as much time as possible speaking, listening to, 
reading and writing the language. 


SMOOTH INTERACTIONS 

Although some international institutions 
conduct much of their official business in 
English, scientists who work in such places 
find that learning the native tongue can help 
to smooth interactions and relationships. 
Many of the Chilean staff at Cerro Tololo Inter- 
American Observatory, which has its offices in 
La Serena, Chile, speak English in addition to 
their native Spanish. But observatory director 
Steve Heathcote says that speaking Spanish can 
be helpful for visiting international researchers 
who need, for example, to choose optical filters 
— some of their names sound very similar in 
Spanish — or to navigate life in La Serena. “It 
gets complicated sometimes,” he says about 
potential misunderstandings in English. “And 
there's plenty of room for confusion” 

Junior scientists who travel abroad may find 
that they can make important connections in 
the field far more easily by speaking in the local 
tongue. Wildlife researcher Owen Bidder grew 
up speaking Welsh and English, and wanted to 
learn German after he accepted a postdoc at the 
University of Veterinary Medicine Hannover > 
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> in Germany. As part of his fellowship, he 
took German language classes and lived with a 
family in the nearby area for two months before 
starting his programme — to great effect. In the 
field, where he works with local hunters who 
trap foxes that he needs for blood samples, no 
one speaks much English (or any Welsh). His 
ability to chat with the hunters in German has 
forged a camaraderie: they grin at his linguistic 
mistakes and cheer his successes. It’s a victory, 
he adds, because they are wary of researchers 
after conflicts in the past over some of the ani- 
mals they hunt. By speaking in German, he says, 
“Tm trying to meet them halfway”. 

His proficiency in German has also helped 
him in other ways. Last September, he pre- 
sented his research at an annual mammalian- 
biology meeting in Germany. He gave most 


of his talk in English, but introduced himself 
and his workin German. Outside the presenta- 
tion, he answered questions in both German 
and English. This, he says, helped attendees to 
relate to his research and improved their com- 
prehension of his responses. “Theyd say, ‘Oh 
— now! understand? he says. 


OVERCOMING INTIMIDATION 

There's little question that learning a language 
can be intimidating at first, even for those like 
Bidder who already speak more than one. It 
helps, says Gabriel Hernandez Valdivia, to 
practise the language in informal settings. 
Valdivia, who is a transportation-systems 
graduate student at the Technical University 
of Munich in Germany, knew early on that 
he wanted to study in the country because of 


LANGUAGE COURSES 


Where to find your voice 


Many universities provide language 
lessons for their students, staff and 
faculty members. Some institutions and 
programmes are open to all. 


United States 

@ Emory College Language Center in 
Atlanta, Georgia, offers language instruction, 
events, options for independent language 
study and online resources (go.nature. 
com/1srhufn). 

@ The University of Chicago’s English 
Language Institute in Illinois runs three- 
week intensive English courses every 
summer (go.nature.com/1x2sizh). 

@ Middlebury Language Schools in Vermont 
holds intensive courses each summer in 
Arabic, Chinese, German, Japanese, Russian 
and more (go.nature.com/25envje). 

@ Concordia Language Villages in Minnesota 
holds week-long immersion courses for 
adults in Spanish, German, French and 
Japanese (go.nature.com/288h5e8). 

@ CRDF Global in Arlington, Virginia, 
organizes multi-week, intensive English 
courses for early-career scientists in a 

wide range of countries. Participants 

work on career skills, discuss research 

and participate in excursions and social 
events, all in English. The programme often 
ends with a mock conference (go.nature. 
com/1zizroz). 

@ An online manual, Scientific English as a 
Foreign Language, covers short lessons in 
scientific communication, from commonly 
confused words and phrases to tips on 
writing mathematical equations and papers 
(go.nature.com/10/7frcr). 


Germany 
@ The Alexander von Humboldt Foundation 
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in Bonn will cover the fees for four months 
of intensive German courses, including 
accommodation and a stipend, for research 
fellows at German institutions and their 
spouses (go.nature.com/1tseqt7). 

@ The Goethe Institute has a range of 
intensive German courses, with some 
tailored to the medical field. Others combine 
language instruction with a four-week 
internship in fields such as mechanical 
engineering. Online tutorials are also 
available (go.nature.com/22cilcp). 


Japan 

@ The Japanese MEXT scholarship funds 
six months at a Japanese language institute 
(go.nature.com/25eolq9). 


Latin America 

@ Cerro Tololo Inter-American Observatory, 
based in La Serena, Chile, funds Spanish 
courses for incoming staff and their families 
(go.nature.com/20x3mkv). 

@ The National Autonomous University of 
Mexico’s Learning Center for Foreigners 

in Mexico City offers courses for all levels 

of Spanish as well as classes focused on 
Mexican culture (go.nature.com/1zjOq1d). 


United Kingdom 

@ The University of Manchester has 
programmes of varying lengths in both 
general and academic English (go.nature. 
com/22bsgu0). 

@ Studio Cambridge provides intensive 
English courses for adult learners all year 
round (go.nature.com/1ujpfwa). 

@ Cambridge University Press offers 
Cambridge English for Scientists, a book on 
written and spoken English that comes with 
audio CDs (go.nature.com/1tzbypo). C.W. 
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Germany’s transportation expertise. But even 
though he had taken German courses in his 
native Mexico, he was overwhelmed when he 
first arrived in Germany. “It took me an hour 
to read three pages,” he says of technical text- 
books. He worked to pick up technical vocabu- 
lary, but his overall speaking skills improved 
vastly after he joined extracurricular activities, 
such as playing Frisbee, that let him hone his 
language skills outside the classroom. He also 
uses an app that provides closed captioning so 
that he can see films in German and read along 
in the same language. 

When Bidder was first learning German, he 
picked out a few idioms to use in conversation 
that served as a source of amusement — and 
connection. One of his favourites is das Gelbe 
vom Ei — ‘the yellow of the egg’ — which he 
uses as the equivalent of the English phrase 
‘cream of the crop. The hunters get a laugh out 
of it, he says, because it’s a quaint expression. He 
says it so often that his colleagues have affixed a 
poster with the phrase to his office door. 

To boost comprehension and fluency, it 
helps to attend meetings. Ana San Gabriel, who 
is from Spain, went to Japan for a three-year 
veterinary-science fellowship at the University 
of Tokyo and was initially bored at the frequent 
meetings. Then she realized that they offered 
a great opportunity for her to learn technical 
and lab-related words and phrases. She soon 
felt much more comfortable asking students 
and staff members for help in Japanese. “If you 
learn the key words, it’s easier for you to speak 
about issues in the lab,” she says. 


THROW CAUTION TO THE WIND 

Even if you fret about sounding foolish, it’s 
important to keep throwing yourself into 
situations in which youre forced to speak the 
local language. When Maria Jimenez-Sanchez 
started a PhD programme in Spain, she worried 
that she wasnt proficient enough in English to 
compete for a postdoc abroad. So she did a 
three-month exchange programme ina US lab, 
and another in a Swiss lab where English was 
spoken. She gained enough confidence — and 
mastery of English — to apply for her neuro- 
science postdoc at the University of Cambridge, 
UK. Today, she’s the vice-president of the Span- 
ish Society of Researchers in the United King- 
dom, a group that offers support in areas such 
as career development and networking, mainly 
in English. International researchers need to 
lose their self-consciousness and talk, she says. 
“When people are listening to you, they just 
want to hear what you have to say.” 

There are other, more nuanced benefits to 
learning the native tongue of a country where 
you work or study. San Gabriel found that 
learning Japanese gave her insight into the 
nation’s culture and the hierarchal nature of 
Japanese society. In turn, she learned to rec- 
ognize how and when to seek opportunities 
such as grants, pay rises and advancement. 
“You have to learn the art of discussion,’ she 


RYAN LASH/TED 


says. “You can still say what you think, but 
you have to learn where and when” 

Even if a researcher doesn't move to 
another country, learning a new language 
can be helpful for collaborating with col- 
leagues abroad and understanding the 
research in their field. While studying 
auklets asa PhD student in Newfoundland, 
Canada, Alex Bond found that several cru- 
cial papers and reports, as well as older pub- 
lications about the small seabirds from the 
North Pacific Ocean, were in Russian. He 
could neither read nor speak it. 

He converted his laptop keyboard to 
Cyrillic and turned to Google Translate, 
Wikipedia and a Russian-to-English dic- 
tionary for help. Soon, he could recog- 
nize names of places and species. When 
he started a postdoc at the University of 

Saskatchewan in 


“If you learn Canada, he arranged 
thekey words, for tutoring in Rus- 
it’s easier for sian and, after two 
you to speak years, his reading 
aboutissuesin _ skills hadimproved. 
the lab.” He felt that he was 


more of an asset 
to the Russian researchers with whom he 
was collaborating — he could understand 
papers that they wanted him to read, and 
incorporate studies in English into their 
co-authored papers. “Just because some- 
thing’s not in English doesn’t mean you 
should ignore it,” says Bond, now a senior 
conservation scientist at the Royal Society 
for the Protection of Birds in Sandy, UK. 
Fluency in the tongue of one’s adopted 
nation also has advantages that may not 
directly affect research, but can boost life 
satisfaction. Heathcote spoke little Span- 
ish for the first 3 years after he arrived in 
Chile more than 30 years ago. Then he met 
a Chilean woman. In three months, he went 
from having almost no Spanish to great 
eloquence — albeit with terrible grammar, 
he says. As for his new Spanish-speaking 
friend? He married her. m 


Cameron Walker is a freelance writer in 
Santa Barbara, California. 


CORRECTIONS 

The caption for the main image 
accompanying the Careers Feature 
‘Change is in the air’ (Nature 532, 403- 
404; 2016) named the wrong silver- 
spotted skipper. The picture is actually of 
Epargyreus clarus, not Hesperia comma. 
The Careers Feature ‘Take my advice’ 
(Nature 532, 531-533; 2016) 
erroneously called Michael Langa 
co-founder of miLEAD. He was one of 
the first consultants, but did not help to 
found the company. 


TURNING POINT 


CAREERS 


Aerial archaeologist 


Sarah Parcak helped to establish the use 

of satellite imagery to identify potential 
archaeological sites. Last year, she was 
awarded US$1 million from TED, the non- 
profit organization devoted to spreading 

ideas. Parcak, a remote-sensing expert at the 
University of Alabama at Birmingham, plans 
to use the money to fulfil her dream of creating 
an online portal for citizen scientists to help 
discover archaeological treasures. 


How did you get the idea to apply satellite 
imaging to archaeology? 

My grandfather, Harold Young, a forestry 
professor at the University of Maine in Orono, 
was a pioneer in the use of aerial photogra- 
phy to look at forests. He would measure tree 
heights and look at the health of forests that 
were going to be used in paper manufacturing. I 
wondered how to apply that technology. He had 
passed away by the time I was an undergradu- 
ate. I was surprised to find that aerial imaging 
hadnt been applied to archaeology before. 


Were you the first to use this technology? 
There was a cohort of about six of us working 
mainly in the Middle East — in Turkey, Syria, 
Iraq and Egypt — to explore how to use satel- 
lite data, which has now helped practitioners 
move beyond their traditional focus on one site 
for an entire career. To understand sites in a 
broader context, it’s not efficient to do work on 
the ground. You have to think big, look from 
above and follow old river courses. 


How is your work changing archaeology? 
Ihope that I’ve encouraged colleagues to think 
of the scale of sites differently. Most recently, 
we discovered what may be a Viking settle- 
ment in Newfoundland, Canada. It was the 
first time that the technology had been used 
in the search for potential Norse sites. Using 
high-resolution satellite imagery, we found 
two potential sites that, when ground-truthed, 
yielded one likely Viking site. These techniques 
give you robust data that can be used to focus 
field efforts. 


What about its use in previously studied areas? 
Using high-resolution imagery, colleagues and 
I recently found what appeared to be a massive 
rectangular platform in one of the most well- 
surveyed archaeological zones in Petra, Jordan. 
Chris Tuttle, the executive director of the non- 
profit Council of American Overseas Research 
Centers in Washington DC, used drones to 
survey the object, and confirmed that it’s mas- 
sive — 80 metres by 40 metres — and dates to 
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2,000 years ago. Despite the site having been 
studied for 150 years, we missed what was 
probably a large ritual structure. Imagine what 
else we havent found. 


How did TED impact your work? 

I gave a short TED talk in 2012 that aired on 
National Public Radio, and I was made a senior 
TED fellow two years later. The TED prize was 
very unexpected, to put it mildly. I got a mes- 
sage last summer saying that I'd been nomi- 
nated. I filled out a ‘what would your wish be 
questionnaire. Then I had 18 minutes in Feb- 
ruary to make a public case for Global Xplorer, 
which is an online citizen-science platform to 
train an army of global explorers. I celebrated 
the work of colleagues but also gave the sense 
of real urgency that our field faces with so 
much destruction — from conflict to climate 
change — around the world. The prize com- 
pletely changed my life. Its both an opportu- 
nity and major responsibility. 


What do you expect Global Xplorer to 
accomplish? 

Our team has scientific training and exper- 
tise; the bottleneck is the time spent search- 
ing through images. We have been scouring 
images to detect looting in the wake of the 2011 
Arab Spring, and it has been one of the most 
depressing things ever. I believe to my core that 
the only chance we have to save cultural herit- 
age sites around the world is to turn everyone 
into explorers. By turning people into what I 
call ‘space archaeologists, they will develop a 
sense of pride and ownership in preserving 
our cultural heritage. I think it’s one of the only 
chances to save the past. = 


INTERVIEW BY VIRGINIA GEWIN 


This interview has been edited for length and clarity. 
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SIX NAMES FOR THE END 


BY KEN HINCKLEY 


LAST GASP 

The silver dollar clunks through the innards 
of the coin-box and out of the return with a 
hollow ring. 

“What's wrong Daddy?” asks my daughter, 
Clematissa, from her faded stallion of hand- 
painted plastic. 

“Nothing, Tissa, don't worry about it? I 
reply, using the diminutive she accepts only 
from me. 

Ina fit of sentimentality I had named her 
after the extinct vines that once trellised the 
gardens of the world. Now, the pout furrow- 
ing her brow mocks the geometry of pink 
tulle (improvised by her mother, of course) 
that ruffles her dress. 

“Then why wont the horsies go?” 

The abandoned shopping centre reeks of 
stale air salted with dust, baked too many 
years under an oven of cracked skylights that 
turn the firmament into a shattered dream. 
Punctured soup cans and protein-jerky 
wrappers litter the carousel, but I thought 
maybe one last go-round on Tissa’s favourite 
stallion would give her something to remem- 
ber me by. 

Guess not. 

“T think they’re just all tuckered out, 
sweetie,’ I say, as I roll up the sleeves of my 
work-jersey. “They need to rest up for the 
big trip too.” 

Her lower lip curls and her eyes brim with 
tears. 

“Hang ona minute,’ I say, with a wink. I fix 
the leather strap around Tissa’s waist. Then I 
jump down, grab one of the tarnished brass 
poles, and push. There’s no music and the 
lights on the mirrors stare back at me like so 
many dead eyes, but at least the up-and-down 
of the horses accompanies my exertions. 

Irun round and round till ’'m drenched in 
sweat, my jersey ripe with the fug of it. Out 
of shape, gasping for air. 

“Thank you, Daddy,’ she says as I lift her 
off the stallion, and the look in her eyes 
almost melts my heart. “You smell awful!” 


QUIETUS 
I’ve read the last story and there she is, my 
little angel gone to sleep. Her chest rises and 
falls atop the mattress and I pull the fleece 
blanket, patterned with pink and purple 
hearts, over her tiny body. 

It’s the last thing I can do for her. 

The apartment is quiet, too quiet. No hum 
of the fridge. No clink ofa radiator warming 
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Time to say goodbye. 


up. I settle on the musty-smelling carpet 
next to Tissa. 
It’s gonna be a long, long night. 


TERMINUS 

Our marriage ended badly, this awkward 
union of Barbara and Wilbur, like 

so many other things on this 
world. Even our names 
sounded terrible together. 
It should have been a 
clue. 

But at least I won 
partial custody. For a 
time. 

Everyone's so busy 
getting ready to leave 
this place behind that 
they've forgotten their 
roots. Forgotten where they 
come from. It makes them light 
in the head, spending all their time 
in such rarefied air. 

The Lagrange staging-points are bustling. 
The brilliant flares of the departing starships 
make an inferno of the sky, night after night 
after night. 

I tell Tissa it’s the dawn of new hope, even 
though I can't come round to believing it 
myself. 

Because someone has to stay behind. 

And someone always does. 


FINIS 

They say we can't long survive here. That 
soon there'll be nothing left to save. The 
world’s done. Finito. Finis. 

Well, colour me sceptical. Or just stub- 
born. 

lintend to fight to the end. 

But what ties me to this piece of rock? 
Whatis it really that makes me stay? 

My father died here, and his father before 
him. I can taste the salt of the evaporating 
oceans on the air. 

I imagine what's left is filled with tears. 
All the tears of every man, woman and child 
who ever lived. I can't let that dry up. I can’t 
let that all be for nothing. 

The oceans will brim again. 


SAYONARA 
I don’t tell Tissa it’s the last visit. I don't tell 
her it’s the end. I don’t even say goodbye. 

I just drop her with the governess at the 
station. Kiss her on the cheek. Pretend it’s 
just another time I’m sending her back to 
Mommy. 
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All this strength, this hard shell I put on, 
hides a weak man. 
Sayonara to me. 


RAPTURE 
Tissas mother comes down from the sky. 
The transport roars as it touches 
down, unsettling the earth, 
sending the dust off in a 
fruitless search of heaven. 
A stairway drops, a uni- 
formed steward leads 
a line of children to 
the bottom, and they 
ascend, clutching plush 
dolls and security blan- 
kets to their cheeks. 
I catch sight of Tissa as 
she hesitates near the top. 
She searches for me in the 
thin crowd, but I’m too far away, 
my wave too feeble and a moment 
too late to catch her attention. 

She's gone. 

My resolve wavers and suddenly I’m 
clomping down the aluminium bleachers, 
sprinting across the tarmac. 

But I’m not running to her. 

No, I’m putting as much distance between 
myself and the transport as my iffy knees 
will allow. 

I've made my choice. I’m going to forge 
something of this broken-down Earth. It’s 
where I belong. 

But that doesn’t mean I can bear to watch 
Tissa go. 

Iknow shell find her new home out there, 
in the stars, after I'm long dead. It’s a light- 
speed journey, or nearly so. My Tissa, my 
only daughter, is destined to live a million 
years. 

And when she gets there, I like to think 
that she might even remember this one-trick 
pony she once called Daddy, that she'll look 
back on an Earth born anew. 

But right now I can’t get far enough away. 

As I run, the transport accelerates past 
Mach 1, a pressure wave that fractures the 
sky. 

It makes the sound of my heart breaking 
in two. m 


Ken Hinckley — writer, principal scientist 
(Microsoft Research) and editor-in-chief 
(Transactions on Computer-Human 
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